SPEC OSG SPECmail2009 Benchmark
Workload Characterization for SPECmail_Ent2009 Metric

Mike Abbott, Yun-seng Chao

December 2008


 

Summary

This document summarizes the studies on mail server workload collected from multiple university and corporate sources, using a variety of IMAP4 clients. The analyzed workloads consist of both SMTP and IMAP4 requests. Each request is described by parameters which fully characterize its behavior. The proposed models, which are obtained by analyzing these parameters, are able to reproduce the behavior of the mail server workloads.

Document Organization

The report is organized as follows. We start with a description of the measurements and of the parameters considered in our studies. We then present the models characterizing the mail server workloads and we briefly describe how to use these models.

 

SPECmail2009 Additions/Changes

Much of the document discusses the workload changes between new SPECmail2009 and the original SPECmail2008 benchmark workload. Many of the internal distributions were updated with complete message and folder profiles provided by Apple, Inc in 2008. Most of this data replaces the original message and mailbox composition distributions. The SMTP traffic levels have been incorporated into the recipient and message size distributions.

One workload addition not discussed in this document is the ability to test using encrypted TCP connections. The reason lies in where this encryption incurs its cost. The e-mail clients issue commands according to user or programatic directives, regardless of the network connection's encryption mode. Empirical data shows both SUT and e-mail clients require extra computing and/or memory resources if encryption exists. Therefore, the benchmark's Secure metric influences the number of concurrent network sessions and interarrival times but not the actual command sequences. The two SPECmail2009 metrics show the effects of encrypted network connections on the SUT.

Measurements and Parameters

The measurements analyzed in our studies come from different sources. The measurements related to SMTP and IMAP4 have been provided by four companies and by two universities.? The collected sessions were divided into five IMAP4 and two SMTP groups.? The sessions within each group form the basis for all of the parameters that define the Enterprise User Profile, emulated by the SPECmail2009 benchmark.

 

IMAP Information Sources – Enterprise

Data Source

Total Number of Users

Number of IMAP Users

Data Source Type

Network Type

Mirapoint

223

223

Small company

LAN

Openwave

2500

500

Medium company

WAN

Sun

147

147

Medium workgroup

LAN

Apple

39,970

~30,000

Large corporation

LAN/WAN

University of Wollongong

Unknown

 

Medium University

LAN

Purdue University

Unknown

 

Medium University

LAN

SPECmail2009 (Enterprise Model)

42,000+ (250 Minimum)

32,000+ (250 Minimum)

Enterprise
(Small to Large)

LAN/MAN
(0% dialup)

SPECmail2008 (Enterprise Model)

250 (Minimum)

250 (Minimum)

Enterprise
(Small to Medium)

LAN/MAN (1% dialup)

SPECmail2001 (Dialup ISP Model)

10,000

10,000

Consumer

Dialup
(98% dialup)



Mailbox and Message Structures

The IMAP4 protocol allows email clients to create and maintain any number of folders and subfolders, in addition to the standard Inbox folder used in the SPECmail2001 POP3 user profile.? The IMAP4 command set also allows email clients to ask the server to describe these structures.? This information is independent of the delivery or retrieval protocols and so is treated outside of specific protocol and/or server context.

Multipurpose Internet Mail Extension (MIME) Profile

MIME is an internet attachment scheme, defined as a formal standard by RFCs 1521, 1522, and 1523.? The Sun and Apple data sets provided detailed information about mailbox and message structure.? Thus they form the basis for the following probability distribution tables used in the benchmark.?

The initial processing of all message sizes distinguished between single part sizes and multipart sizes.? The IMAP4 benchmark prioritizes individual MIME part size over the global message size distribution.

Single Part messages (Sun: 76% of total, Apple: 47% of total)

  1. Use “Content-type: text/plain” or no content-type at all in message headers
  2. Use subpart content size distribution

 

Multipart Message (Sun: 24% of total, Apple: 53% of total)

  1. Use “Content Type: multipart/mixed; boundary=”xxxxxxxxx-counter” or “Content Type: multipart/alternative; boundary=”xxxxxxxxx-counter” in message headers
  2. Use distributions for message part width and depth to help establish the set of multipart message bodies.
  3. Categorize MIME messages to fall into one of these pre-defined multipart buckets.
  4. Use subpart content size distribution to define the sub-part sizes in the fixed pool of pre-defined multipart messages.

Below are the distributions used in constructing messages in compliant with the MIME standard.

MIME Part size (bytes) vs. Probabilities Distribution

Part Size

Probability (Sun)

Probability (Apple)

Part Size

Probability (Sun)

Probability (Apple)

Part Size

Probability (Sun)

Probability (Apple)

0

N/A

0.04%

256

10.5%

2.28%

128 KB

0.7%

1.88%

1

N/A

< 0.001%

512

15.6%

6.37%

256 KB

0.4%

1.21%

2

0.6%

< 0.01%

1 KB

13.6%

9.22%

512 KB

0.3%

0.68%

4

0.1%

< 0.01%

2 KB

13.9%

18.00%

1 MB

0.2%

0.45%

8

0.4%

< 0.01%

4 KB

13.4%

28.97%

2 MB

0.1%

0.27%

16

0.8%

< 0.01%

8 KB

8.5%

11.37%

4 MB

N/A

0.19%

32

1.8%

0.05%

16 KB

4.3%

6.46%

8 MB

N/A

0.10%

64

4.1%

0.31%

32 KB

2.3%

3.91%

16 MB

N/A

0.03%

128

7.2%

5.18%

64 KB

1.2%

3.02%

32 MB

N/A

0.01%

 

 

 

 

 

 

64 MB

N/A

< 0.01%

 

MIME Distribution Chart

 


The following tables show the distribution of the number of MIME parts at the top level (without regard to nesting). It reflects the count of multipart/mixed parts immediately “attached” to the main message. It does not reflect any counting of multipart/alternative parts (i.e. text/plain and text/html, alternative formats of the same text). Nor does it reflect the MIME attachment depths (“attachments” to “attachments” or forwarded messages).

 

MIME Top-Level Part Counts Distribution

Part Count

Probability (Sun)

Probability (Apple)

Part Count

Probability (Sun)

Probability (Apple)

Part Count

Probability (Sun)

Probability (Apple)

0

N/A

46.69%

3

1.99%

2.51%

6

N/A

0.06%

1

75.76%

3.77%

4

0.24%

0.29%

7

N/A

0.07%

2

21.91%

46.20%

5

0.09%

0.26%

8+

N/A

0.15%

 

MIME Parts Chart

 

 

The next tables show the distribution of the nested MIME Part Levels that occur within a given message from the sample of MIME parts. It generally reflects messages or attachments which are forwarded multiple times, each time adding another depth level to the resulting message.

 

Distribution of MIME Part Depths

Part Depth

Probability (Sun)

Probability (Apple)

Part Depth

Probability (Sun)

Probability (Apple)

Part Depth

Probability (Sun)

Probability (Apple)

0 or 1

91.24%

90.18%

3

0.87%

0.62%

5

0.03%

0.01%

2

7.73%

9.14%

4

0.13%

0.04%

6+

N/A

< 0.01%

 

MIME Depth Chart

 

The following tables show the distribution of primary MIME Content Type (not including subtype) of all the parts in the entire sample.

 

MIME Content Type Distribution

Content type

Probability (Sun)

Probability (Apple)

Content type

Probability (Sun)

Probability (Apple)

TEXT

92.193%

86.584%

IMAGE

0.888%

5.943%

APPLICATION

4.265%

6.971%

AUDIO

0.016%

0.018%

MESSAGE

2.633%

0.465%

VIDEO

0.004%

0.019%

 

MIME Types Chart

After Sun's values were reviewed, a former employee noted that the Unix company that provided MIME distributions tended to use more text messages. Other companies have more and larger MIME parts that have richer, non-textual, content such as word processor documents, presentations, spreadsheets, web pages, calendar events, images, audio, and both rich and simple alternate MIME structures. The major effect of this shift is a tendency to increase the overall message sizes, and decreasing the Text content type in favor of the other categories.

However, increased Alternate structures does not eliminate the Text portion's counts. It just increases the other content types counters. Also, the IMAP server is not required to interpret the actual MIME parts content. It must extract the MIME part(s) and send the content, as is, to the IMAP4 client, which performs the interpretation. Therefore, the shift in Content Type distribution affects the benchmark's MIME structure of the message delivered to the SUT. The SUT still must deconstruct these MIME structures, but not the actual content.

 


Messages Per Folder

The following tables show the distribution of messages in folders at the first five levels.

Level by Level Message Probability Distributions - Mirapoint, Openwave, Sun

Top Level

Level 1

Level 2

Level 3

Level 4

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

16.4%

0

8.1%

0

6.1%

0

6.8%

0

1.0%

1

21.5%

1

31.9%

1

48.1%

1

49.5%

1

81.4%

2

3.4%

2

4.6%

2

3.2%

2

3.2%

2

1.0%

3

2.8%

3

2.9%

3

2.1%

3

3.2%

3

1.0%

4

2.1%

4

2.4%

4

2.7%

4

2.2%

5

1.0%

5

2.1%

5

2.0%

5

1.5%

5

2.0%

6

2.9%

6

1.7%

6

1.7%

6

2.3%

6

1.8%

20

4.9%

7

1.2%

7

1.6%

7

1.6%

7

1.8%

30

2.0%

8

1.5%

8

1.1%

8

1.5%

9

2.0%

40

1.0%

9

1.5%

9

1.1%

9

1.2%

10

1.1%

80

2.0%

20

7.3%

10

1.3%

20

7.8%

20

10.3%

200

2.0%

30

5.2%

20

7.8%

30

3.8%

30

4.1%

 

 

40

3.0%

30

5.3%

40

3.1%

40

3.1%

 

 

50

2.0%

40

3.8%

50

2.1%

50

1.3%

 

 

60

1.9%

50

2.6%

60

1.2%

70

1.8%

 

 

70

1.4%

60

1.8%

80

1.6%

100

1.3%

 

 

80

1.3%

70

1.6%

100

1.3%

200

2.2%

 

 

90

1.0%

80

1.3%

200

2.5%

600

1.4%

 

 

200

5.6%

90

1.1%

300

1.2%

3000

1.1%

 

 

300

3.0%

200

5.9%

500

1.2%

 

 

 

 

400

1.3%

300

1.9%

800

1.0%

 

 

 

 

500

1.0%

400

1.5%

2000

3.0%

 

 

 

 

600

1.1%

500

1.2%

 

 

 

 

 

 

1000

2.2%

700

1.6%

 

 

 

 

 

 

2000

3.1%

1000

1.2%

 

 

 

 

 

 

3000

1.5%

2000

1.5%

 

 

 

 

 

 

4000

2.3%

5000

1.2%

 

 

 

 

 

 

 

Level by Level Message Probability Distributions - Apple

Top Level

Level 1

Level 2

Level 3

Level 4

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

0.84%

0

32.83%

0

15.35%

0

10.21%

0

9.45%

1

2.10%

1

6.79%

1

8.21%

1

11.06%

1

9.64%

2

0.66%

2

3.96%

2

5.70%

2

6.40%

2

7.93%

3

0.47%

3

2.94%

3

4.31%

3

4.83%

3

6.06%

4

0.80%

4

2.31%

4

3.52%

4

4.05%

5

9.74%

5

0.87%

5

2.03%

5

2.97%

5

3.54%

6

4.02%

6

0.77%

6

1.74%

6

2.56%

6

2.94%

20

25.41%

7

0.95%

7

1.50%

7

2.22%

7

2.76%

30

6.47%

8

0.75%

8

1.35%

8

2.01%

9

4.61%

40

4.52%

9

0.6%

9

1.26%

9

1.85%

10

1.97%

80

6.90%

20

6.07%

10

1.16%

20

12.57%

20

12.82%

200

9.88%

30

4.10%

20

7.82%

30

6.28%

30

6.94%

 

 

40

3.75%

30

4.57%

40

4.07%

40

4.22%

 

 

50

3.01%

40

3.13%

50

2.97%

50

2.96%

 

 

60

2.83%

50

2.40%

60

2.26%

70

3.97%

 

 

70

2.62%

60

1.84%

80

3.44%

100

3.39%

 

 

80

2.08%

70

1.48%

100

2.37%

200

5.25%

 

 

90

2.14%

80

1.29%

200

6.10%

600

7.04%

 

 

200

14.91%

90

1.09%

300

 

3000

1.07%

 

 

300

 

200

6.54%

500

5.11%

 

 

 

 

400

 

300

 

800

2.55%

 

 

 

 

500

17.52%

400

 

2000

3.58%

 

 

 

 

600

 

500

5.18%

 

 

 

 

 

 

1000

11.03%

700

 

 

 

 

 

 

 

2000

8.22%

1000

2.77%

 

 

 

 

 

 

3000

 

2000

1.92%

 

 

 

 

 

 

4000

12.91%

5000

2.09%

 

 

 

 

 

 

Message Distribution Chart 1Message Distribution Chart 2Message Distribution Chart 3Message Distribution Chart 4Message Distribution Chart 5

Here is the same data from Apple bucketed such that each contains roughly five percentage points. These are the actual values used in the benchmark.

Level by Level Message Probability Distributions - Apple

Top Level

Level 1

Level 2

Level 3

Level 4

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

0.84%

0

32.83%

0

15.35%

0

10.21%

0

9.45%

5

4.90%

1

6.79%

1

8.21%

1

11.06%

1

9.64%

12

5.00%

3

6.89%

2

5.70%

2

6.40%

2

7.93%

22

5.07%

6

6.08%

4

7.83%

4

8.88%

3

6.06%

35

5.19%

10

5.27%

6

5.53%

6

6.48%

4

5.14%

51

5.01%

16

5.28%

9

6.08%

8

5.31%

6

8.62%

70

5.15%

25

5.03%

13

5.73%

11

5.71%

8

6.25%

95

5.16%

40

5.21%

18

5.19%

15

5.86%

11

6.61%

127

5.10%

65

5.01%

25

5.09%

20

5.28%

14

5.47%

165

5.09%

111

5.00%

35

5.07%

27

5.15%

19

5.95%

212

5.01%

212

5.01%

51

5.06%

38

5.27%

26

5.51%

274

5.02%

524

5.00%

77

5.06%

56

5.15%

36

5.12%

356

5.05%

2577

5.00%

126

5.05%

91

5.06%

55

5.11%

466

5.03%

3000+

1.60%

239

5.00%

169

5.01%

104

5.05%

623

5.02%

 

 

654

5.00%

462

5.00%

359

5.01%

855

5.01%

 

 

2000+

5.05%

1000+

4.17%

500+

3.08%

1232

5.01%

 

 

 

 

 

 

 

 

1922

5.00%

 

 

 

 

 

 

 

 

3275

5.01%

 

 

 

 

 

 

 

 

4000+

8.33%

 

 

 

 

 

 

 

 

Mailbox Distribution Profile

 

A mail server that supports IMAP is likely to support a hierarchy of several mailboxes (also known folders) in addition to the default INBOX mailbox for each user.? Below are several distributions to construct the structure of mailboxes contained within a mailstore supported by IMAP.? The data used is extracted from the four enterprise data samples (Mirapoint, Openwave, Sun, Apple).

The following tables show the probably of an individual user having a certain number of mailboxes (aka folders) at each level (depth).? The data reflects the probability distributions for the first five (5) levels, even though the actual samples went many levels deeper than that.?

 

 

Level by Level Subfolder Probability Distributions - Mirapoint, Openwave, Sun

Top to Level 1

Level 1 to 2

Level 2 to 3

Level 3 to 4

Level 4 to 5

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

1

34.9%

1

31.4%

1

43.0%

1

39.6%

1

36.8%

2

21.7%

2

12.4%

2

14.9%

2

12.6%

2

7.9%

3

11.6%

3

7.4%

3

9.1%

3

8.1%

3

39.5%

4

7.0%

4

5.6%

4

6.8%

4

10.8%

4

5.3%

5

2.0%

5

4.0%

5

3.5%

5

2.7%

6

2.6%

6

2.4%

6

2.4%

6

4.1%

6

7.2%

7

2.6%

7

1.5%

7

5.0%

7

2.0%

7

2.7%

8

5.3%

8

0.7%

9

5.8%

8

2.0%

8

0.9%

 

 

9

0.7%

10

2.6%

9

3.3%

9

1.8%

 

 

10

0.7%

15

7.4%

10

1.0%

14

3.6%

 

 

20

8.1%

20

3.2%

20

5.8%

15

0.9%

 

 

30

3.7%

30

7.2%

30

3.0%

20

3.6%

 

 

40

1.8%

70

3.4%

40

0.5%

25

2.7%

 

 

50

2.0%

200

1.8%

50

0.5%

30

1.8%

 

 

103

1.3%

246

0.4%

61

0.3%

42

0.9%

 

 

 

Level by Level Subfolder Probability Distributions - Apple

Top to Level 1

Level 1 to 2

Level 2 to 3

Level 3 to 4

Level 4 to 5

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

1

0.38%

1

37.28%

1

38.86%

1

41.69%

1

37.52%

2

0.71%

2

14.13%

2

17.28%

2

17.26%

2

23.47%

3

41.11%

3

12.26%

3

10.23%

3

10.82%

3

13.88%

4

17.15%

4

6.60%

4

7.07%

4

6.71%

4

6.94%

5

8.48%

5

5.41%

5

5.13%

5

5.30%

5

3.47%

6

5.59%

6

4.09%

6

3.69%

6

3.56%

6

3.64%

7

4.01%

7

3.14%

7

3.06%

7

2.51%

7

1.98%

8

3.24%

8

2.57%

8

2.18%

8

1.78%

8

1.16%

9

2.66%

9

2.08%

9

1.97%

9

1.74%

9

0.99%

10

2.04%

10

1.66%

10

1.77%

10

1.10%

10

1.16%

15

6.57%

15

4.97%

15

4.07%

15

3.79%

15

2.64%

20

3.28%

20

2.40%

20

1.94%

20

2.15%

20

1.32%

25

1.77%

25

1.17%

25

0.77%

25

0.64%

25

1.32%

50

2.49%

50

1.66%

50

1.66%

50

0.82%

50

0.33%

100

0.40%

100

0.38%

100

0.27%

100

0.05%

 

 

500

0.10%

500

0.17%

500

0.06%

500

0.09%

 

 

501+

0.01%

501+

0.02%

 

 

 

 

501+

0.17%

Folder Distribution Chart 1Folder Distribution Chart 2Folder Distribution Chart 3Folder Distribution Chart 4Folder Distribution Chart 5

 

The following tables show the percent of folders at each level containing any subfolders.

 

Level by Level Folders With Any Subfolders - Mirapoint, Openwave, Sun

Top to Level 1

Level 1 to 2

Level 2 to 3

Level 3 to 4

Level 4 to 5

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

59.0%

0

64.0%

0

80.0%

0

78.4%

0

97.4%

1

21.9%

1

20.6%

1

15.7%

1

14.4%

1

2.6%

2

7.5%

2

9.0%

2

2.3%

2

3.6%

 

 

3

3.3%

3

1.4%

3

0.8%

3

1.8%

 

 

4

2.0%

4

1.2%

4

1.0%

4

1.8%

 

 

5

0.4%

5

0.8%

6

0.2%

 

 

 

 

6

1.3%

6

1.0%

 

 

 

 

 

 

7

2.0%

7

0.6%

 

 

 

 

 

 

8

0.4%

8

0.2%

 

 

 

 

 

 

9

0.7%

9

0.2%

 

 

 

 

 

 

10

0.7%

10

0.4%

 

 

 

 

 

 

11

0.4%

15

0.4%

 

 

 

 

 

 

21

0.25%

19

0.2%

 

 

 

 

 

 

26

0.15%

 

 

 

 

 

 

 

 

 

Level by Level Folders With Any Subfolders - Apple

Top to Level 1

Level 1 to 2

Level 2 to 3

Level 3 to 4

Level 4 to 5

Width

Probability

Width

Probability

Width

Probability

Width

Probability

Width

Probability

0

94.58%

0

90.62%

0

92.14%

0

92.81%

0

95.25%

1

2.02%

1

3.65%

1

3.28%

1

2.70%

1

2.04%

2

0.77%

2

1.62%

2

1.36%

2

1.69%

2

1.10%

3

0.67%

3

0.96%

3

0.85%

3

1.00%

5

0.98%

4

0.36%

4

0.66%

4

0.53%

5

0.75%

10

0.63%

5

0.29%

5

0.48%

5

0.42%

10

1.05%

 

 

6

0.22%

6

0.35%

6

0.28%

 

 

 

 

7

0.17%

7

0.29%

7

0.20%

 

 

 

 

8

0.14%

8

0.20%

8

0.14%

 

 

 

 

9

0.11%

9

0.18%

9

0.14%

 

 

 

 

10

0.09%

10

0.17%

10

0.66%

 

 

 

 

20

0.58%

15

0.82%

 

 

 

 

 

 

Subfolder Distribution Chart 1Subfolder Distribution Chart 2Subfolder Distribution Chart 3Subfolder Distribution Chart 4Subfolder Distribution Chart 5

 

Mailbox Structure Example


Below is a walk through of the construction of a folder tree with a diagram to illustrate the use of the above distribution tables in creating a folder tree for user “U1”. The probability values used are only examples, not actual distribution table entries.

 

Folder Level Construction for User “U1” Example

Level

Next Level

Probability Computation

Diagram Representation

0

1

10.1% probability of 10 sub-folders

Create folders A1 through A10.

 

 

7.2% probability of 2 folders having sub-folders

Mark folders A5 and A10 red to indicate presence of Level 2 sub-folders

1

2

6.3% probability of 7 sub-folders under A5

Create folders B1 through B7 under A5

 

 

23.5% probability of 1 sub-folder under A10

Folder B1 under A10

2

3

5.4% probability of 1 folder under A5 having any sub-folders

Mark folder A5.B5 red to indicate presence of Level 3 sub-folders

 

 

32.4% probability of 0 folders under A10 having any sub-folders

No subfolders under A10.B1

 

 

35.8% probability of 1 level-3 sub-folder under A5.B5

Create folder C1 under A5.B5

3

4

56.8% probability of 0 folders under A5.B5 having any sub-folders

No subfolders under A5.B5.C1

The diagram below shows the mailbox structure for user U1.

Figure 1: Mailbox Structure Diagram

Mailbox folder structure for User "U1" with ten (10) Level 1 subfolers, seven (7) subfolers under A5, one (1) subfoler under A10 and only Level 2 subfolder A5B5 has one subfolder.

Peak Hour Determination

The overall peak traffic hour must be based on both SMTP and the corresponding IMAP activity over the same period of time.? Therefore, only two data samples were used to determine the relative workloads – Mirapoint and Openwave.? The other data samples did not provide corresponding SMTP logs for this purpose.

Peak Hour Traffic Volumes and Active Users

The following table shows the overall traffic volumes and users over the course of the peak day (determined by total number of message activity from the data samples).

 

Peak Mail Server Traffic – Enterprise

Data

Mirapoint Samples

Openwave Samples

Sample Hour

SMTP

IMAP

Combined

Unique Sender/Rcpt

SMTP

IMAP

Combined

Unique Sender/Rcpt

0

503

57

560

 

1169

5166

6335

 

1

571

60

631

 

1289

5435

6724

 

2

519

60

579

 

1033

4319

5352

 

3

456

60

516

 

1114

4210

5324

 

4

479

60

539

 

1158

4054

5212

 

5

503

63

566

 

1076

3777

4853

 

6

550

60

610

 

1108

3503

4611

 

7

869

103

972

 

1042

4566

5608

 

8

942

606

1548

 

1449

7383

8832

 

9

1198

1075

2273

 

2174

7315

9489

 

10

1029

2278

3307

90/160

2082

7247

9329

 

11

987

23015

24002

 

2217

6331

8548

 

12

874

2052

2926

 

2079

6186

8265

 

13

978

1507

2485

 

2120

7784

9904

 

14

1560

1235

2795

 

2818

8246

11064

 

15

1485

1119

2604

 

3809

10196

14005

 

16

841

783

1624

 

4846

10620

15466

1836/1836

17

803

541

1344

 

5665

9306

14971

 

18

502

466

968

 

5513

8504

14017

 

19

360

412

772

 

5125

6462

11587

 

20

316

249

565

 

4177

6260

10437

 

21

476

215

691

 

4440

6067

10507

 

22

377

218

595

 

4271

6133

10404

 

23

340

229

569

 

4004

6178

10182

 

Daily

17,518

36,523

30,039

238/168

65,778

155,248

155,248

2,254/3,000

 

The Peak Hour Percent of Active Users is computed by first using the larger of the two unique Sending or Recipient users and dividing that value by the total number of provisioned users.? The percentages are then pro-rated based on the relative number of actual users to compute the actual Percent of Active Users used in the benchmark.

 

Peak Hour Percent Provisioned Users

Company (Source)

Data Type

Number of Users

Percent PH/Prov

Mirapoint

Peak Hour

160

 

 

Provisioned

269

59%

Openwave

Peak Hour

1,836

 

 

Provisioned

2,299

80%

Normalized PH Percent Active Users

78%

 

PEAK_PCT_USERS = "78"

The Peak Hour Activity Percentage can be derived by using the traffic volume from the peak hour and the daily total for each protocol.? Again the benchmark value is computed by pro-rating each data sample within the overall user counts.

 

Peak Hour Percent Of Daily Traffic by Protocol

Company

SMTP

IMAP

Combined

Mirapoint

6%

6%

11%

Openwave

7%

7%

10%

Normalized PH Percent of Daily Traffic

10%

 

 

 

SMTP log files

SMTP Traffic Analysis

The SMTP log files reflect mail transfer agent workloads from the four enterprises, three collected over the course of fourteen (14) to thirty (30) days of operation and one from one day. The workload refers to all the requests processed by the mail server for delivering incoming and outgoing messages. These enterprises ranged from approximately 120 to 40,000 users.? The data logs cover the full 24-hour day, over the course of the data collection period.

The parameters used to describe the requests processed by the mail server are:

  • time stamp of the request
  • size [byte]
  • number of recipients

The table below shows the statistics for SMTP traffic flows and message sizes for the four enterprises.?? The ISP user profile statistics are included to illustrate the difference with the original user model.

 

SMTP In/Out-bound Traffic – Enterprise

Data Source

Percent

Inbound Traffic

Percent

Outbound Traffic

Average Message Size (KB)

Data Source Type

Mirapoint

85%

15%

24

Small company

Openwave

92%

8%

44

Medium company

Sun

98%

2%

23

Medium workgroup

Apple

Unknown

Unknown

105

Large corporation

SPECmail2009 Enterprise Model

93%

7%

101

Pro-rated medium/large company

SPECmail2008 Enterprise Model

93%

7%

38

Pro-rated small/medium company

SPECmail2001 (Dialup ISP Model)

53%

47%

25

Consumer Dialup


The following two tables contain the profile of the number of recipients per message, based on the Mirapoint and Openwave SMTP data from the busiest day of the week. The Apple SMTP sample did not distinguish between remote and local recipients, nor the actual recipient count from the RCPT TO step. So this version reuses the original recipient information. This data was extracted from both recipients named in the RCPT TO lines, as well as, recipient counts based on the mailing list expansions. The benchmark uses the probably distributions in the second table to generate the actual SMTP traffic.

 

Peak Hour SMTP Message Rate Comparison

Company (Source)

Data Type

Peak Hour Total Mesg/User

Daily Total Mesg/User

Peak Hour Mesg per Unique User

Mirapoint

Sender

11.4

73.6

8.7

 

Recipient

6.4

65.1

6.7

Openwave

Sender

2.6

29.2

0.5

 

Recipient

2.6

21.9

4.9

Normalized PH Messages Per User

5

 

Peak Hour From/To Analysis

Company (Source)

Data Type

From Local to Remote

From Local to Local

From Remote to Local

Mirapoint

Count

84

431

262

 

% of Total

11%

55%

34%

Openwave

Count

195

789

429

 

% of Total

14%

56%

30%

Normalized PH SMTP Message Flow

13%

56%

31%

 

 

SMTP Recipients per Message – Enterprise

Data Source

Minimum

Average

Maximum

Mirapoint

1

2.0

133

Openwave

1

3.3

74

Sun Microsystems

n/a

n/a

n/a

Apple

1

3.9

2061

SPECmail2009 Benchmark

1

3.8

2061

SPECmail2008 Benchmark

1

3.1

133

SPECmail2001 Benchmark

1

2

20

 

SMTP Recipients per Message Distribution

Recipients

Probability
(Mirapoint, Openwave, Sun)

Probability
(Apple)

Recipients

Probability
(Mirapoint, Openwave, Sun)

Probability
(Apple)

1

46.3875%

75.11%

13

0.10%

 

2

11.00%

8.03%

14

0.05%

 

3

9.00%

6.08%

15

0.05%

2.44%

4

8.00%

1.59%

16

0.05%

 

5

7.00%

1.10%

20

N/A

0.69%

6

6.00%

1.48%

25

N/A

0.57%

7

5.00%

0.61%

30

0.05%

 

8

4.00%

0.40%

50

0.01%

0.69%

9

N/A

0.34%

100

0.0025%

0.39%

10

2.00%

0.30%

500

N/A

0.14%

11

1.00%

N/A

1000

N/A

0.03%

12

0.30%

N/A

5000

N/A

0.002%

 

SMTP Recipient Chart

 

As stated above, the recipient distribution includes traffic routed through mailing lists.? The data showed that 7-15% of overall SMTP traffic was sent to a mail distribution list.? This data is included in the recipient distributions above, but described below.? The benchmark does not require creation of any distribution lists.

 

Mailing List Count Profile

Source

Minimum

Average

Maximum

Mirapoint

n/a

n/a

n/a

Openwave

1

12

58

Sun

n/a

n/a

n/a

Apple

1

18

2061

SPECmail2001

n/a

n/a

n/a

 

Mailing List Count Distribution? - Openwave

Recipients

Probability

Recipients

Probability

Recipients

Probability

1

15.2%

11

2.2%

22

1.0%

2

8.4%

12

2.5%

24

1.8%

3

5.5%

13

2.2%

25

1.3%

4

6.0%

14

2.3%

28

1.1%

5

6.9%

15

1.4%

30

0.8%

6

6.9%

16

1.8%

33

0.9%

7

4.4%

17

2.0%

40

2.5%

8

4.3%

18

1.1%

50

2.0%

9

4.8%

19

0.5%

60

4.7%

10

3.6%

21

2.0%

 

 

 

Mailing List Count Distribution? - Apple

Recipients

Probability

Recipients

Probability

Recipients

Probability

1

9.10%

11

5.09%

25

3.67%

2

6.70%

12

1.61%

50

4.43%

3

25.60%

13

2.89%

100

2.56%

4

5.62%

14

3.85%

500

0.89%

5

4.65%

15

1.74%

1000

0.20%

6

8.72%

16

0.58%

5000

0.01%

7

3.23%

17

1.10%

 

 

8

2.14%

18

0.77%

 

 

9

1.80%

19

0.95%

 

 

10

1.39%

20

0.72%

 

 

List Recipient Chart

 

SMTP Message Analysis

The SPECmail2001 method created a single level message that met a fixed message size distribution. Originally, SPECmail2009 attempted to follow the same criteria: generate messages according to the MIME distributions and then map these messages to the final SMTP derived message size distribution. However, these MIME distribution compliant messages did not comply with the SMTP log derived message size distribution. Analysis of the cause found the main reason for this size discrepency. The IMAP message samples form only a subset of the messages flowing through the e-mail system. The POP3 users on these same e-mail servers delete a substantial proportion of their messages, which does not keep messages on the e-mail server. Therefore, these missing messages did not contribute to the MIME definitions of the whole mail store.

Subsequent benchmark design choices prioritized message MIME parts structure and enclosure sizes over the derived SMTP message sizes. The SMTP log derived findings are presented here, but not used by the benchmark.

The SMTP log derived message size data differs between Consumer and Enterprise users. As with the earlier SPECmail2001, the size of each message is counted message size buckets. Overall, the average message size processed through the MTA increased from 24.5 KB to 38.57 KB, then to 101 KB.

Message Size Statistics from SMTP Samples (KB)

SPEC OSG Benchmark

Minimum (KB)

Average (KB)

Median (KB)

Maximum (MB)

SPECmail2001

1

24.5

2.5

2.7

SPECmail2008

0.05

38.57

3.56

21.0

SPECmail2009

2

105

6

139

The following tables describe the size distribution of all messages that flowed through the SMTP servers. This information is included here for completeness.

SMTP Message Size Probability Distribution

Message Size

Probability

Message Size

Probability

256

0.65%

16 KB

4.03%

512

6.46%

64 KB

4.25%

1 KB

17.50%

256 KB

2.39%

2 KB

31.90%

1 MB

0.87%

4 KB

22.47%

4 MB

0.32%

8 KB

9.12%

1 GB

0.04%

 

SMTP Message Size Probability Distribution - Apple

Message Size

Probability

Message Size

Probability

256

 

16 KB

11.30%

512

 

64 KB

15.43%

1 KB

 

256 KB

5.76%

2 KB

2.92%

1 MB

3.15%

4 KB

38.65%

4 MB

1.77%

8 KB

20.58%

1 GB

0.45%

SMTP Size Chart

The SPECmail2009 Enterprise message size distribution has shifted towards larger values. The original enterprise sample's median was in the 2 KB size. The newer data sample exists between 4 and 8 KB. A majority continues to be messages slightly less than 8 KB in size. However, SPECmail2009 creates significantly more large messages. For all messages greater than 16 KB in size, SPECmail2008 created about 11.9% compared to the 37.8% created by SPECmail2009.

SMTP Message Rates

The corporate SMTP samples showed the following characteristics.

SMTP Message Inter-Arrival Time

Mean (s)

Standard Deviation

Minimum

Maximum

2.80

2.37

0

15

 

SMTP Normalized Profile (Peak Hour)

 

Config Parameter

Value

Defnition

PEAK_PCT_USERS

78

Percent of provisioned users receiving messages in the peak hour (also known as 'Active users').

MSG_RECEIVED_PER_PEAK_HOUR

5

Number of messages received by 'Active users' in the peak hour

LOCAL_TO_LOCAL_PCT

56

Percent of total messages sent from Local users to Local users

REMOTE_TO_LOCAL_PCT

31

Percent of total messages sent from Remote users to Local users

LOCAL_TO_REMOTE_PCT

13

Percent of total messages sent from Local users to Remote users

PEAK_LOAD_PERCENT

32

Percent of the daily load occurring during the peak hour

Workload models

We have built a model for each of the parameters characterizing the SMTP requests.

Inter-arrival time distribution

The message inter-arrival time computation uses a simplified model because the total number of messages tends not to be enough to fulfill a complex distribution.? Therefore, the time between message delivery is computed as the total number of messages to be delivered over the duration of the load test run time, divided by that run time.

Inter-arrival Time = (Number of Active Users) X (Messages per User) X (Recipients per Message) / Load Test Time (s)

 

Message Construction

As described in the message size and MIME parts analysis, the benchmark chose to follow message structural and attachment size distributions rather than the total message size distribution used by the earlier SPECmail2001 benchmark.? In that benchmark, the email server tends not to care about the actual message MIME structure and recognizes just headers versus body parts.? IMAP4 email clients understand the concepts of attachments and expect the e-mail server to understand the various message parts.? This meant that the e-mail server must evaluate the actual structure of each message.? Therefore, message structure and individual attachment sizes affect the actual message size, since the MIME structural description is embedded in the message but not visible to most users.

The benchmark uses the above MIME Parts, MIME Part Sizes and MIME Depth distribution tables to construct each message stored in the mail store.

MIME_PART_SIZE = "64,0.40%; 128,5.18%; 256,2.28%; 512,6.37%; 1024,9.22%; 2048,18.00%; 4096,28.97%; 8192,11.37%; 16384,6.46%; 32768,3.91%; 65536,3.02%; 131072,1.88%; 262144,1.21%; 524288,0.68%; 1048576,0.45%; 2097152,0.60%"

Number of recipients

Unlike the Consumer ISP user model, the Enterprise user model Number of Recipients Per Message for is not overwhelmingly dominated by a single value.? The effects of the internal distribution lists shifted the mean (5) away from the median (1) value.? Also, mail distribution lists tend to be used inside enterprises, the maximum recipient count was allowed to be at one hundred (100).

MSG_RECP_DISTRIBUTION = "1,75.11%; 2,8.03%; 3,6.08%; 4,1.59%; 5,1.10%; 6,1.48%; 7,0.61%; 8,0.40%; 9,0.34%; 10,0.30%; 15,2.44%; 20,0.69%; 25,0.57%; 50,0.69%; 100,0.39%; 500,0.18%"

How to use the models

The models described in the previous sections can be used to reproduce the behavior of the real workload of a mail server. In particular, the use of these models is based on sampling the various distributions identified for each of the three characterizing parameters.

To sample the Weibull distribution obtained for the inter-arrival times, it is necessary to invert the function and to derive the inter-arrival time from the probability distribution. To make it short, let u denote a random number uniformly distributed between 0 and 1, the inter-arrival time t i between the i-th and (i-1)-th request is given by:

                       t i = a * (- log (u) ) 1/b            (***)

where log denotes the natural logarithm,  a and b are the parameters of the Weibull function. The procedure should then start by drawing a random number u and by computing the corresponding value of t using the previous formula. Note that u should be strictly greater than 0. For u=1, the value of the inter-arrival time is equal to zero.

In the case of message size and number of recipients, it is necessary to sample the distribution obtained from the buckets. Again it is necessary to draw a random number uniformly distributed between 0 and 1.
 

Scaling issues

Scaling issues arise when the workload model has to be used to represent the load of mail servers characterized by a smaller or larger number of users. This is particularly the case of the arrivals of the requests whose rate depends on the number of users of the mail server. The SPECMail2001 benchmark assumed a linear behavior of the arrival rates, that is, the arrival rate of the requests of a mail server with 100,000 users is 10 times the arrival rate of a mail server with 10,000 users. However, the SPECMail2009 benchmark will never approach the lower limits mandated in the previous model.? There are just not that many enterprises with 10,000 employees, much less 100,000.? A second consideration is that the number of new messages arriving during the peak hour is only a very small portion of the each user’s mail store.?

Therefore, the number of messages arriving for each user is spread at regular intervals.? The overall SMTP work load is determined by the message count multiplied by the number of users, and divided by sixty (60) seconds.?

IMAP4 log files

The IMAP data have been collected from various mail servers at the University of Wollongong, Purdue University, Mirapoint, Openwave and Sun.? The measurements were collected for fourteen (14) to thirty (30) days.? From these log files, IMAP commands from individual IMAP sessions were grouped together and analyzed.

The parameters used to describe each IMAP session are:

  • time stamp
  • number of messages and mailbox size
  • number and size of messages retrieved/deleted within a session

The table below shows the IMAP command percentages generated during each data sample’s peak hour.

IMAP Peak Hour – Based on % of Daily Traffic

Mirapoint

Openwave

Purdue

Sun

Wollongang

Average

13.0%

??????? 6.8%?

8.6%

11.6%

13.3%

10.7%


The statistics refer to one working day (24 hours); we have observed that the behavior of the users is very different over week-end days.

IMAP Session Model

Analysis of the 10’s of thousands of individual IMAP sessions led to the conclusion that different IMAP e-mail clients use the different interaction models.? Furthermore, these models were more complex than the POP3 sessions.? Each IMAP e-mail client could initiate from one (1) to five (5) concurrent session(s), each with its own distinct session initiation patterns.?

 

This leads to a complex IMAP Session model, defined by the combination of two categories: client-type and command sequences.? A command-sequence is a series of IMAP commands performing one or more mail operations within a specific session.? A client-type is a collection of one or more command-sequences.???

 

The following table describe the criteria for each command-sequence.

 

Command Sequence

Client Software

General Characteristic

Comments

1

Netscape (Mozilla), Pine, Mulberry)

?   Create connection??

?   Perform several operations using a variety of commands (probe folder for new messages, deleting, and moving messages, updating flags, list available folders, appending messages, searching for messages, checkpointing, etc.)?

?   Occasionally probe folders for new messages

?   Fetch headers if any messages arrived

?   Occasionally fetch body (whole or parts of body)

?  Focuses on a specific folder

?  Does not log out session

This is one of the “primary” sessions that tend to stay logged into the IMAP server for many hours or days.

Netscape uses UID commands, Pine and Mulberry do not.

Probing folders is accomplished by:

1. Netscape: NOOP; UID FETCH n:* (FLAGS)

2. Mulberry: SEARCH UNSEEN; SEARCH DELETED; FETCH 1:m (FLAG ENVELOPE BODYSTRUCTURE, …)

3. Pine: NOOP

2

Outlook, Outlook Express, Mulberry

?   Create connection??

?   Perform several operations using a variety of commands (probe folder for new messages, deleting, and moving messages, updating flags, list available folders, appending messages, searching for messages, checkpointing, etc.)?

?   Occasionally fetch headers

?   Occasionally fetch header and whole body

?  Does not focus on a specific folder

?  Does not log out of session

This is one of the “primary” sessions that tend to stay logged into the IMAP server for many hours or days.

Probing folders is accomplished by these IMAP commands:

?   UID FETCH n:* (UID, BODY.PEEK[HEADER], …)

?   UID FETCH 1:n-1 (UID FLAGS)

 

3

Fetchmail, Outlook Express

?   Create connection??

?   Fetch headers

?   Fetch whole body

?   Logout

These sessions are very sporadic and show dependency on results returned from Command Sequence 4.

4

Outlook, Outlook Express, Netscape - periodic or triggered actions

?   Create connection??

?   Occasionally probe folders for new messages

?   Occasionally issue other IMAP commands that does not alter the state of the mailstore (such as UNSUBSCRIBE or LIST)?

?   Sometimes logs out, not always

These sessions show very automated behavior and are generated at fixed intervals for each user.

Probing folders is accomplished by:

?   Outlook 2002 – Inbox:
UID FETCH m:* (UID, BODY.PEEK[HEADER], …); or
UID FETCH 1:n (UID FLAGS)

?   Outlook 2002 – Others:
LSUB "" "*"; or
STATUS "mailbox name 1" (UNSEEN);? ..; STATUS "mailbox name n" (UNSEEN);

?   2. Outlook Express:
STATUS "mailbox name" (MESSAGES UNSEEN)

5

Mulberry, Netscape

?   Create connection??

?   Occasionally list or probe folders

?   Perform specific tasks, such as deleting, messages, or appending messages, etc.

?   Alters the state of the mail store

?   Logout

These sessions tend to focus on a specific set of tasks and then log out of the IMAP server.

 

IMAP4 clients will use one or more of the five (5) command sequences.? The IMAP4 benchmark emulates four (4) client types.? During the benchmark run, each of these client type threads represents a single user.? A client may connect one or more times to the IMAP servers.

IMAP Client Classifications and Sequence Map

Client Type

Component Command Sequence

Comments

1

1

4

These two (2) command sequences operate independently and concurrently.? Some of these clients will use message index number while others use the message UID.

2

1

4

5

These three (3) command sequences operate independently and concurrently.? Some of these clients will use message index number while others use the message UID.

3

2

3

4

Command sequence 3 IMAP commands and activities are based on the results from the other command sequences.

4

2

4

5

These three (3) command sequences operator independently and concurrently.? The message index number is used instead of message UID.

 

The compliant run uses the following combination to determine sequencing and dependencies.

CLIENT_TYPE_DISTRIBUTION = "1,31.373%; 3,32.353%; 4,3.922%; 5,2.941%; 13,3.922%; 14,10.784%; 15,1.961%; 24,0.980%; 34,2.941%; 45,2.941%; 134,0.980%; 145,3.922%; 1245,0.980%"

Each tuple defines the command sequence grouping (1 == CS1, 34 == CS3+CS4), and the percentage of overall load generator client threads that will implement each combination. The number of IMAP sessions varies as this matrix changes. Each load generator thread is assigned one specific combination.

IMAP Sample Selection

The extracted IMAP sessions were categorized according to command sequence types. However, some command sequences had an enormous number of sessions while others command sequence sample counts corresponded to the number of users. The fact that each client type uses more than one command sequence also forces the IMAP session selection criteria to gather all related sessions.

The final selection criteria used all IMAP sessions grouped by IMAP user name (found in each sessions’ login state). The resulting data set provides a more coherent model of not only individual primary command sequences (the premise of the SPECMail2001 benchmark) but also the number of related IMAP sessions and actions.

IMAP Command States

The IMAP command set allows many combinations of parameters and options. This means that a single IMAP command can perform more than one logical task, and on one or more messages at the same time. The best example is the FETCH command and its variant, UID FETCH. This single command has been used to retrieve not only the message body, but also message meta-data, headers and as a means to probe a folder for new messages. The latter (folder probe) is also complemented by the IMAP STATUS command, which provides a summary of old/new/deleted message.

The versatility of the IMAP command set leads to a need to expand the concept of a state from a simple command to the specific combination of a command and its parameters.? Included in this combination is the understanding of the number of messages encompassed by that command state, as well as whether it is against an individual, contiguous series or a disjoint set of messages.

IMAP State Codes tracked by Mail2009 Benchmark

State Identfier

State Name

1.                   ?

APPEND

2.                    

CHECK

3.                    

CLOSE

4.                    

COPY_NUM_FOLDER

5.                    

COPY_RANGE_FOLDER

6.                    

CREATE

7.                    

DELETE

8.                    

EXAMINE_FOLDER

9.                    

EXAMINE_INBOX

10.                 

EXAMINE_INBOXSENT

11.                 

EXAMINE_SENT

12.                 

EXAMINE_SENT_ITEMS

13.                 

EXPUNGE

14.                 

FETCH_NUM

15.                 

FETCH_NUM_BODYALL

16.                 

FETCH_NUM_BODYPARTS

17.                 

FETCH_NUM_BODYPEEK

18.                 

FETCH_NUM_BODYPEEK_HEADER

19.                 

FETCH_NUM_BODYPEEK_HEADERFIELDS

20.                 

FETCH_NUM_BODYSTRUCTURE_FLAGS

21.                 

FETCH_NUM_BODY_BODYALL_HEADERFIELDS

22.                 

FETCH_NUM_BODY_HEADER

23.                 

FETCH_NUM_ENVELOPE_BODYPEEK_HEADERFIELDS_BODYSTRUCTURE_FLAGS_INTERNALDATE_RFC822SIZE

24.                 

FETCH_NUM_ENVELOPE_BODYPEEK_HEADERFIELDS_FLAGS_INTERNALDATE_RFC822SIZE_UID

25.                 

FETCH_NUM_FLAGS

26.                 

FETCH_NUM_FLAGS_BODYPEEK_HEADERFIELDS_INTERNALDATE_RFC822SIZE

27.                 

FETCH_NUM_FLAGS_BODYSTRUCTURE_ENVELOPE_INTERNALDATE_RFC822SIZE_UID

28.                 

FETCH_NUM_RFC822HEADER

29.                 

FETCH_NUM_RFC822TEXT

30.                 

FETCH_NUM_UID

31.                 

FETCH_NUM_UID_BODYPEEK_HEADERFIELDS_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE

32.                 

FETCH_RANGE_UID

33.                 

FETCH_RANGE_BODYPEEK_HEADERFIELDS

34.                 

FETCH_RANGE_ENVELOPE_BODYPEEK_HEADERFIELDS_FLAGS_INTERNALDATE_RFC822SIZE_UID

35.                 

FETCH_RANGE_FLAGS_BODYPEEK_HEADERFIELDS_INTERNALDATE_RFC822SIZE

36.                 

FETCH_RANGE_FLAGS_BODYSTRUCTURE_ENVELOPE_INTERNALDATE_RFC822SIZE_UID

37.                 

FETCH_RANGE_UID_BODYPEEK_HEADERFIELDS_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE

38.                 

FETCH_SERIES_ENVELOPE_BODYPEEK_HEADERFIELDS_FLAGS_INTERNALDATE_RFC822SIZE_UID

39.                 

FETCH_SERIES_ENVELOPE_BODYSTRUCTURE_INTERNALDATE_RFC822SIZE

40.                 

FETCH_SERIES_FLAGS_BODYPEEK_HEADERFIELDS_INTERNALDATE_RFC822SIZE

41.                 

FETCH_SERIES_UID

42.                 

FETCH_UID

43.                 

LIST

44.                 

LOGIN

45.                 

LOGOUT

46.                 

LSUB_NULL_FOLDER

47.                 

LSUB_NULL_PART

48.                 

LSUB_NULL_SENT

49.                 

LSUB_NULL_WILDCARD

50.                 

LSUB_WILDCARD_WILDCARD

51.                 

NOOP

52.                 

RENAME_FOLDER_FOLDER

53.                 

RENAME_INBOXINBOXSENT_INBOXTRASHINBOXSENT

54.                 

SEARCH_ALL_DELETED

55.                 

SEARCH_ALL_RANGE_CHARSET_RFCHEADER

56.                 

SEARCH_ALL_RFCHEADER

57.                 

SEARCH_ALL_UNDELETED_UNSEEN

58.                 

SEARCH_DELETED

59.                 

SEARCH_RFCHEADER

60.                 

SEARCH_UNDELETED

61.                 

SEARCH_UNSEEN

62.                 

SELECT_

63.                 

SELECT_FOLDER

64.                 

SELECT_FOLDER_ITEMS

65.                 

SELECT_INBOX

66.                 

SELECT_INBOXSENT

67.                 

SELECT_INBOXSENT_ITEMS

68.                 

SELECT_SENT

69.                 

SELECT_SENT_ITEMS

70.                 

STARTED

71.                 

STATUS_FOLDER_ITEMS_MESSAGES_UNSEEN

72.                 

STATUS_FOLDER_ITEMS_UNSEEN

73.                 

STATUS_FOLDER_MESSAGES

74.                 

STATUS_FOLDER_MESSAGES_RECENT_UNSEEN_UIDVALIDITY_UIDNEXT

75.                 

STATUS_FOLDER_MESSAGES_UNSEEN

76.                 

STATUS_FOLDER_UIDNEXT

77.                 

STATUS_FOLDER_UIDNEXT_UIDVALIDITY_MESSAGES

78.                 

STATUS_FOLDER_UNSEEN

79.                 

STATUS_INBOXSENT_ITEMS_MESSAGES_UNSEEN

80.                 

STATUS_INBOXSENT_ITEMS_UNSEEN

81.                 

STATUS_INBOXSENT_UNSEEN

82.                 

STATUS_INBOXSENT_MESSAGES_UNSEEN

83.                 

STATUS_INBOX_MESSAGES_RECENT_UNSEEN_UIDVALIDITY_UIDNEXT

84.                 

STATUS_INBOX_MESSAGES_UNSEEN

85.                 

STATUS_INBOX_UIDNEXT

86.                 

STATUS_INBOX_UIDNEXT_UIDVALIDITY_MESSAGES

87.                 

STATUS_INBOX_UNSEEN

88.                 

STATUS_SENT_ITEMS_MESSAGES_UNSEEN

89.                 

STATUS_SENT_ITEMS_UNSEEN

90.                 

STATUS_SENT_MESSAGES_UNSEEN

91.                 

STATUS_SENT_UNSEEN

92.                 

STORE_NUM_SET_FLAGS_ANSWERED

93.                 

STORE_NUM_SET_FLAGS_DELETED

94.                 

STORE_NUM_SET_FLAGS_SEEN

95.                 

STORE_NUM_UNSET_FLAGS_DELETED

96.                 

STORE_NUM_UNSET_FLAGS_SEEN

97.                 

STORE_RANGE_SET_FLAGS_DELETED

98.                 

STORE_RANGE_SET_FLAGS_SEEN

99.                 

STORE_SERIES_SET_FLAGS_DELETED

100.              

STORE_UNTILEND_SET_FLAGS_DELETED

101.              

STORE_UNTILEND_SET_FLAGS_SEEN

102.              

SUBSCRIBE_FOLDER

103.              

SUBSCRIBE_INBOXSENT

104.              

UID_COPY_NUM_FOLDER

105.              

UID_COPY_NUM_INBOX

106.              

UID_COPY_NUM_INBOXSENT

107.              

UID_COPY_RANGE_FOLDER

108.              

UID_COPY_RANGE_INBOX

109.              

UID_COPY_RANGE_INBOXSENT

110.              

UID_COPY_SERIES_FOLDER

111.              

UID_FETCH_NUM_BODY

112.              

UID_FETCH_NUM_BODYALL

113.              

UID_FETCH_NUM_BODYPARTS

114.              

UID_FETCH_NUM_BODYPEEK

115.              

UID_FETCH_NUM_BODYPEEKALL

116.              

UID_FETCH_NUM_BODYPEEK_HEADER

117.              

UID_FETCH_NUM_BODYPEEK_UID

118.              

UID_FETCH_NUM_BODYSTRUCTURE

119.              

UID_FETCH_NUM_BODY_BODYMIMEALL_BODYMIMEPARTS_HEADER

120.              

UID_FETCH_NUM_BODY_BODYMIMEALL_HEADER

121.              

UID_FETCH_NUM_BODY_HEADER

122.              

UID_FETCH_NUM_ENVELOPE

123.              

UID_FETCH_NUM_FLAGS

124.              

UID_FETCH_NUM_RFC822SIZE

125.              

UID_FETCH_NUM_UID

126.              

UID_FETCH_NUM_UID_BODYPEEK_FLAGS_INTERNALDATE

127.              

UID_FETCH_NUM_UID_BODYPEEK_FLAGS_INTERNALDATE_RFC822SIZE

128.              

UID_FETCH_NUM_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE

129.              

UID_FETCH_NUM_UID_BODYPEEK_HEADER_FLAGS_INTERNALDATE_RFC822SIZE

130.              

UID_FETCH_NUM_UID_BODYPEEK_RFC822SIZE

131.              

UID_FETCH_NUM_UID_BODY_RFC822SIZE

132.              

UID_FETCH_RANGE_UID_BODYPEEK_FLAGS_INTERNALDATE

133.              

UID_FETCH_RANGE_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE

134.              

UID_FETCH_RANGE_UID_BODYPEEK_RFC822SIZE

135.              

UID_FETCH_RANGE_UID_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE

136.              

UID_FETCH_RANGE_UID_FLAGS

137.              

UID_FETCH_RANGE_UID_RFC822SIZE_BODYPEEK_HEADERFIELDS

138.              

UID_FETCH_RANGE_UID_UID_BODYPEEK_HEADER_HEADERFIELDS_FLAGS_FLAGS_RFC822SIZE_RFC822SIZE_UID

139.              

UID_FETCH_SERIES_UID_BODYPEEK_FLAGS_INTERNALDATE

140.              

UID_FETCH_SERIES_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE

141.              

UID_FETCH_SERIES_UID_BODYPEEK_RFC822SIZE

142.              

UID_FETCH_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE

143.              

UID_FETCH_UID_BODYPEEK_HEADER_FLAGS_INTERNALDATE_RFC822SIZE

144.              

UID_FETCH_UNTILEND_BODYPEEK_HEADERFIELDS_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE_UID

145.              

UID_FETCH_UNTILEND_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE_UID

146.              

UID_FETCH_UNTILEND_FLAGS

147.              

UID_FETCH_UNTILEND_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE

148.              

UID_FETCH_UNTILEND_UID_BODYPEEK_HEADER_FLAGS_INTERNALDATE_RFC822SIZE

149.              

UID_FETCH_UNTILEND_UID_FLAGS

150.              

UID_FETCH_UNTILEND_UID_FLAGS_INTERNALDATE_RFC822HEADER_RFC822SIZE

151.              

UID_SEARCH_ANSWERED

152.              

UID_SEARCH_DELETED

153.              

UID_SEARCH_FLAGGED

154.              

UID_SEARCH_HEADER_QUESTION_RFCHEADER_UNDELETED

155.              

UID_SEARCH_HEADER_RFCHEADER_UNDELETED

156.              

UID_SEARCH_HEADER_UNDELETED

157.              

UID_SEARCH_KEYWORD

158.              

UID_SEARCH_NOTDELETED_UID_UNTILEND

159.              

UID_SEARCH_RFCHEADER_UNDELETED

160.              

UID_SEARCH_SEEN

161.              

UID_SEARCH_SINCE

162.              

UID_SEARCH_UID_NUM

163.              

UID_SEARCH_UID_NUM_NOTDELETED

164.              

UID_SEARCH_UID_RANGE

165.              

UID_SEARCH_UID_RANGE_NOTDELETED

166.              

UID_SEARCH_UID_UNTILEND_UNDELETED_UNDRAFT_UNSEEN

167.              

UID_SEARCH_UID_UNTILEND_UNDELETED_UNSEEN

168.              

UID_SEARCH_UNDELETED

169.              

UID_SEARCH_UNDELETED_UNSEEN

170.              

UID_SEARCH_UNSEEN

171.              

UID_SEARCH_UNTILEND

172.              

UID_STORE_NUM_SET_FLAGS_ANSWERED

173.              

UID_STORE_NUM_SET_FLAGS_ANSWERED_DELETED_SEEN

174.              

UID_STORE_NUM_SET_FLAGS_ANSWERED_SEEN

175.              

UID_STORE_NUM_SET_FLAGS_DELETED

176.              

UID_STORE_NUM_SET_FLAGS_DELETED_SEEN

177.              

UID_STORE_NUM_SET_FLAGS_FLAGGED

178.              

UID_STORE_NUM_SET_FLAGS_SEEN

179.              

UID_STORE_NUM_SET_FLAGS_SEEN_ANSWERED

180.              

UID_STORE_NUM_SET_FLAGS_SEEN_DELETED

181.              

UID_STORE_NUM_UNSET_FLAGS

182.              

UID_STORE_NUM_UNSET_FLAGS_ANSWERED

183.              

UID_STORE_NUM_UNSET_FLAGS_DELETED

184.              

UID_STORE_NUM_UNSET_FLAGS_FLAGGED

185.              

UID_STORE_NUM_UNSET_FLAGS_FLAGGED_ANSWERED

186.              

UID_STORE_NUM_UNSET_FLAGS_FLAGGED_FORWARDED_MDNSENT_DELETED_DRAFT

187.              

UID_STORE_NUM_UNSET_FLAGS_SEEN

188.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED

189.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_DELETED

190.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_DELETED_DRAFT_FLAGGED

191.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_DELETED_FLAGGED

192.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_FLAGGED

193.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_DELETED

194.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_FLAGGED

195.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED

196.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_DELETED_DRAFT_FLAGGED

197.              

UID_STORE_NUM_UNSET_FLAGS_SEEN_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED

198.              

UID_STORE_RANGE_SET_FLAGS_ANSWERED

199.              

UID_STORE_RANGE_SET_FLAGS_DELETED

200.              

UID_STORE_RANGE_SET_FLAGS_DELETED_SEEN

201.              

UID_STORE_RANGE_SET_FLAGS_SEEN

202.              

UID_STORE_RANGE_SET_FLAGS_SEEN_DELETED

203.              

UID_STORE_RANGE_UNSET_FLAGS

204.              

UID_STORE_RANGE_UNSET_FLAGS_ANSWERED_FORWARDED_MDNSENT_DELETED_DRAFT_FLAGGED

205.              

UID_STORE_RANGE_UNSET_FLAGS_DELETED

206.              

UID_STORE_RANGE_UNSET_FLAGS_SEEN

207.              

UID_STORE_RANGE_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED

208.              

UID_STORE_SERIES_SET_FLAGS_DELETED

209.              

UID_STORE_SERIES_SET_FLAGS_DELETED_SEEN

210.              

UID_STORE_SERIES_SET_FLAGS_SEEN

211.              

UID_STORE_SERIES_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED

212.              

UID_STORE_UNSET_FLAGS_SEEN

213.              

UNSUBSCRIBE_FOLDER

214.              

SEARCH_ALL_CALL_INFORMATION

215.              

UID_COPY_NUM_

216.              

UID_COPY_NUM_TRASH

217.              

UID_COPY_RANGE_TRASH

218.              

UID_COPY_SERIES_TRASH

219.              

UID_FETCH_NUM_BODYPEEK_RFC822SIZE_UID

220.              

UID_FETCH_NUM_BODY_RFC822SIZE_UID

221.              

UID_FETCH_NUM_UID_BODYPEEK_HEADER_FLAGS_RFC822SIZE

222.              

UID_FETCH_RANGE_UID_BODYPEEK_HEADER_FLAGS_RFC822SIZE

223.              

LSUB

224.              

SUBSCRIBE_TRASH

225.              

UID_COPY_NUM

226.              

UID_COPY_RANGE

227.              

UID_COPY_SERIES

228.              

UID_FETCH_NUM_BODYMIMEALL

229.              

UID_FETCH_NUM_UID_BODYSTRUCTURE

230.              

UID_FETCH_RANGE_BODYPEEK_HEADERFIELDS

231.              

UID_FETCH_UNTILEND_FLAGS_RFC822SIZE

232.              

SESSION_START

 

Functionally, there are many redundant states.? However, it was felt that the effort to use message unique identifier (UID) versus the variable message index number is significant. The UID remains fixed for the life of the mailstore. The message index number changes as the number of messages change. This means a message has many relative index numbers across multiple IMAP sessions.  For similar reasons, it was felt that operating on a contiguous range of messages generates a different workload than a random set of message numbers.

The various FETCH commands against headers and body sections make more sense when divided according to command sequences and client types. Many of these are the other half of a folder probe, depending on the client type. The present or absence of MIME parts also factor into the construction of the final command, as these lead to further probes of individual MIME parts.

The actual state transition charts used by the benchmark is too complex for this document. The Architecture White Paper provides the named objects used in the SPECmail2009 source code.

Inter-arrival Time Distributions

The uncertainty or randomness associated with the arrival of IMAP commands to the server is modeled using a Markov model to specify the statistical relationships between commands as transition-probability matrices. Each command sequence has its own probabilities and probability distributions of specific states. All entries in these transition-probability matrices were derived from a subset of the data samples. The base inter-arrival transition matrix uses a lognormal formula similar to the one used for SPECMail2001. Please consult the source code for the actual values.

IMAP Scaling Considerations

Since each IMAP user represents a variable number of IMAP sessions and incurs a considerable amount of storage space, scaling has problems at both ends of the mandated range. A further complication is the actual distribution of command sequences, since some occur much more frequently than others.

A benchmark run with too few users runs into the problem of missing command sequences as well as compliance with the mail store folder structures, message size and count distributions. Experiments have shown that a minimum of 250 users must be used for a compliant run that meets folder and message structure distributions.

The other end of the scaling problem lies in the number of IMAP users and the fact that each user represents one or more concurrent IMAP session. The IMAP benchmark apparently supports fewer users per load generator, compared to SPECmail2001. But this is very misleading since the correct consideration should be the concurrent number of client sessions and their activity levels in these sessions.

The POP3 benchmark defines very short session times - on the order of a few seconds for at least 75% of all POP3 sessions that find no messages to retrieve. So dispite the large number of defined POP3 users, the 25% active users only log into the system four times during the peak hour. Furthermore, the typical POP3 session lasts only 2-5 seconds, executing at most 10 commands (25%), but usually three commands (75%) within each session. This means that only a small subset of users is actively connected at any one time.

In constrast, IMAP users are always connected and active. The number of IMAP users determines the minimum concurrent IMAP sessions that stay logged into the IMAP server for the entire peak hour simulation. The client type distribution values then determines the number of ancillary IMAP sessions that will be generated. This means the IMAP server should allow at least

4.5 X UserCount

concurrent client socket connections.


Copyright © 2001-2009 Standard Performance Evaluation Corporation

All Rights Reserved