SPEC OSG SPECmail2009 Benchmark
Workload Characterization for SPECmail_Ent2009 Metric

Mike Abbott, Yun-seng Chao

December 2008

Summary

This document summarizes the studies on mail server workload collected from multiple university and corporate sources, using a variety of IMAP4 clients. The analyzed workloads consist of both SMTP and IMAP4 requests. Each request is described by parameters which fully characterize its behavior. The proposed models, which are obtained by analyzing these parameters, are able to reproduce the behavior of the mail server workloads.

Document Organization

The report is organized as follows. We start with a description of the measurements and of the parameters considered in our studies. We then present the models characterizing the mail server workloads and we briefly describe how to use these models.

SPECmail2009 Additions/Changes

Much of the document discusses the workload changes between new SPECmail2009 and the original SPECmail2008 benchmark workload. Many of the internal distributions were updated with complete message and folder profiles provided by Apple, Inc in 2008. Most of this data replaces the original message and mailbox composition distributions. The SMTP traffic levels have been incorporated into the recipient and message size distributions.

One workload addition not discussed in this document is the ability to test using encrypted TCP connections. The reason lies in where this encryption incurs its cost. The e-mail clients issue commands according to user or programatic directives, regardless of the network connection's encryption mode. Empirical data shows both SUT and e-mail clients require extra computing and/or memory resources if encryption exists. Therefore, the benchmark's Secure metric influences the number of concurrent network sessions and interarrival times but not the actual command sequences. The two SPECmail2009 metrics show the effects of encrypted network connections on the SUT.

Measurements and Parameters

The measurements analyzed in our studies come from different sources. The measurements related to SMTP and IMAP4 have been provided by four companies and by two universities.? The collected sessions were divided into five IMAP4 and two SMTP groups.? The sessions within each group form the basis for all of the parameters that define the Enterprise User Profile, emulated by the SPECmail2009 benchmark.

IMAP Information Sources – Enterprise
Data Source	Total Number of Users	Number of IMAP Users	Data Source Type	Network Type
Mirapoint	223	223	Small company	LAN
Openwave	2500	500	Medium company	WAN
Sun	147	147	Medium workgroup	LAN
Apple	39,970	~30,000	Large corporation	LAN/WAN
University of Wollongong	Unknown		Medium University	LAN
Purdue University	Unknown		Medium University	LAN
SPECmail2009 (Enterprise Model)	42,000+ (250 Minimum)	32,000+ (250 Minimum)	Enterprise (Small to Large)	LAN/MAN (0% dialup)
SPECmail2008 (Enterprise Model)	250 (Minimum)	250 (Minimum)	Enterprise (Small to Medium)	LAN/MAN (1% dialup)
SPECmail2001 (Dialup ISP Model)	10,000	10,000	Consumer	Dialup (98% dialup)

Mailbox and Message Structures

The IMAP4 protocol allows email clients to create and maintain any number of folders and subfolders, in addition to the standard Inbox folder used in the SPECmail2001 POP3 user profile.? The IMAP4 command set also allows email clients to ask the server to describe these structures.? This information is independent of the delivery or retrieval protocols and so is treated outside of specific protocol and/or server context.

Multipurpose Internet Mail Extension (MIME) Profile

MIME is an internet attachment scheme, defined as a formal standard by RFCs 1521, 1522, and 1523.? The Sun and Apple data sets provided detailed information about mailbox and message structure.? Thus they form the basis for the following probability distribution tables used in the benchmark.?

The initial processing of all message sizes distinguished between single part sizes and multipart sizes.? The IMAP4 benchmark prioritizes individual MIME part size over the global message size distribution.

Single Part messages (Sun: 76% of total, Apple: 47% of total)

Use “Content-type: text/plain” or no content-type at all in message headers
Use subpart content size distribution

Multipart Message (Sun: 24% of total, Apple: 53% of total)

Use “Content Type: multipart/mixed; boundary=”xxxxxxxxx-counter” or “Content Type: multipart/alternative; boundary=”xxxxxxxxx-counter” in message headers
Use distributions for message part width and depth to help establish the set of multipart message bodies.
Categorize MIME messages to fall into one of these pre-defined multipart buckets.
Use subpart content size distribution to define the sub-part sizes in the fixed pool of pre-defined multipart messages.

Below are the distributions used in constructing messages in compliant with the MIME standard.

MIME Part size (bytes) vs. Probabilities Distribution
Part Size	Probability (Sun)	Probability (Apple)	Part Size	Probability (Sun)	Probability (Apple)	Part Size	Probability (Sun)	Probability (Apple)
0	N/A	0.04%	256	10.5%	2.28%	128 KB	0.7%	1.88%
1	N/A	< 0.001%	512	15.6%	6.37%	256 KB	0.4%	1.21%
2	0.6%	< 0.01%	1 KB	13.6%	9.22%	512 KB	0.3%	0.68%
4	0.1%	< 0.01%	2 KB	13.9%	18.00%	1 MB	0.2%	0.45%
8	0.4%	< 0.01%	4 KB	13.4%	28.97%	2 MB	0.1%	0.27%
16	0.8%	< 0.01%	8 KB	8.5%	11.37%	4 MB	N/A	0.19%
32	1.8%	0.05%	16 KB	4.3%	6.46%	8 MB	N/A	0.10%
64	4.1%	0.31%	32 KB	2.3%	3.91%	16 MB	N/A	0.03%
128	7.2%	5.18%	64 KB	1.2%	3.02%	32 MB	N/A	0.01%
						64 MB	N/A	< 0.01%

MIME Distribution Chart

The following tables show the distribution of the number of MIME parts at the top level (without regard to nesting). It reflects the count of multipart/mixed parts immediately “attached” to the main message. It does not reflect any counting of multipart/alternative parts (i.e. text/plain and text/html, alternative formats of the same text). Nor does it reflect the MIME attachment depths (“attachments” to “attachments” or forwarded messages).

MIME Top-Level Part Counts Distribution
Part Count	Probability (Sun)	Probability (Apple)	Part Count	Probability (Sun)	Probability (Apple)	Part Count	Probability (Sun)	Probability (Apple)
0	N/A	46.69%	3	1.99%	2.51%	6	N/A	0.06%
1	75.76%	3.77%	4	0.24%	0.29%	7	N/A	0.07%
2	21.91%	46.20%	5	0.09%	0.26%	8+	N/A	0.15%

MIME Parts Chart

The next tables show the distribution of the nested MIME Part Levels that occur within a given message from the sample of MIME parts. It generally reflects messages or attachments which are forwarded multiple times, each time adding another depth level to the resulting message.

Distribution of MIME Part Depths
Part Depth	Probability (Sun)	Probability (Apple)	Part Depth	Probability (Sun)	Probability (Apple)	Part Depth	Probability (Sun)	Probability (Apple)
0 or 1	91.24%	90.18%	3	0.87%	0.62%	5	0.03%	0.01%
2	7.73%	9.14%	4	0.13%	0.04%	6+	N/A	< 0.01%

MIME Depth Chart

The following tables show the distribution of primary MIME Content Type (not including subtype) of all the parts in the entire sample.

MIME Content Type Distribution
Content type	Probability (Sun)	Probability (Apple)	Content type	Probability (Sun)	Probability (Apple)
TEXT	92.193%	86.584%	IMAGE	0.888%	5.943%
APPLICATION	4.265%	6.971%	AUDIO	0.016%	0.018%
MESSAGE	2.633%	0.465%	VIDEO	0.004%	0.019%

MIME Types Chart

After Sun's values were reviewed, a former employee noted that the Unix company that provided MIME distributions tended to use more text messages. Other companies have more and larger MIME parts that have richer, non-textual, content such as word processor documents, presentations, spreadsheets, web pages, calendar events, images, audio, and both rich and simple alternate MIME structures. The major effect of this shift is a tendency to increase the overall message sizes, and decreasing the Text content type in favor of the other categories.

However, increased Alternate structures does not eliminate the Text portion's counts. It just increases the other content types counters. Also, the IMAP server is not required to interpret the actual MIME parts content. It must extract the MIME part(s) and send the content, as is, to the IMAP4 client, which performs the interpretation. Therefore, the shift in Content Type distribution affects the benchmark's MIME structure of the message delivered to the SUT. The SUT still must deconstruct these MIME structures, but not the actual content.

Messages Per Folder

The following tables show the distribution of messages in folders at the first five levels.

Level by Level Message Probability Distributions - Mirapoint, Openwave, Sun
Top Level		Level 1		Level 2		Level 3		Level 4
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
0	16.4%	0	8.1%	0	6.1%	0	6.8%	0	1.0%
1	21.5%	1	31.9%	1	48.1%	1	49.5%	1	81.4%
2	3.4%	2	4.6%	2	3.2%	2	3.2%	2	1.0%
3	2.8%	3	2.9%	3	2.1%	3	3.2%	3	1.0%
4	2.1%	4	2.4%	4	2.7%	4	2.2%	5	1.0%
5	2.1%	5	2.0%	5	1.5%	5	2.0%	6	2.9%
6	1.7%	6	1.7%	6	2.3%	6	1.8%	20	4.9%
7	1.2%	7	1.6%	7	1.6%	7	1.8%	30	2.0%
8	1.5%	8	1.1%	8	1.5%	9	2.0%	40	1.0%
9	1.5%	9	1.1%	9	1.2%	10	1.1%	80	2.0%
20	7.3%	10	1.3%	20	7.8%	20	10.3%	200	2.0%
30	5.2%	20	7.8%	30	3.8%	30	4.1%
40	3.0%	30	5.3%	40	3.1%	40	3.1%
50	2.0%	40	3.8%	50	2.1%	50	1.3%
60	1.9%	50	2.6%	60	1.2%	70	1.8%
70	1.4%	60	1.8%	80	1.6%	100	1.3%
80	1.3%	70	1.6%	100	1.3%	200	2.2%
90	1.0%	80	1.3%	200	2.5%	600	1.4%
200	5.6%	90	1.1%	300	1.2%	3000	1.1%
300	3.0%	200	5.9%	500	1.2%
400	1.3%	300	1.9%	800	1.0%
500	1.0%	400	1.5%	2000	3.0%
600	1.1%	500	1.2%
1000	2.2%	700	1.6%
2000	3.1%	1000	1.2%
3000	1.5%	2000	1.5%
4000	2.3%	5000	1.2%

Level by Level Message Probability Distributions - Apple
Top Level		Level 1		Level 2		Level 3		Level 4
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
0	0.84%	0	32.83%	0	15.35%	0	10.21%	0	9.45%
1	2.10%	1	6.79%	1	8.21%	1	11.06%	1	9.64%
2	0.66%	2	3.96%	2	5.70%	2	6.40%	2	7.93%
3	0.47%	3	2.94%	3	4.31%	3	4.83%	3	6.06%
4	0.80%	4	2.31%	4	3.52%	4	4.05%	5	9.74%
5	0.87%	5	2.03%	5	2.97%	5	3.54%	6	4.02%
6	0.77%	6	1.74%	6	2.56%	6	2.94%	20	25.41%
7	0.95%	7	1.50%	7	2.22%	7	2.76%	30	6.47%
8	0.75%	8	1.35%	8	2.01%	9	4.61%	40	4.52%
9	0.6%	9	1.26%	9	1.85%	10	1.97%	80	6.90%
20	6.07%	10	1.16%	20	12.57%	20	12.82%	200	9.88%
30	4.10%	20	7.82%	30	6.28%	30	6.94%
40	3.75%	30	4.57%	40	4.07%	40	4.22%
50	3.01%	40	3.13%	50	2.97%	50	2.96%
60	2.83%	50	2.40%	60	2.26%	70	3.97%
70	2.62%	60	1.84%	80	3.44%	100	3.39%
80	2.08%	70	1.48%	100	2.37%	200	5.25%
90	2.14%	80	1.29%	200	6.10%	600	7.04%
200	14.91%	90	1.09%	300		3000	1.07%
300		200	6.54%	500	5.11%
400		300		800	2.55%
500	17.52%	400		2000	3.58%
600		500	5.18%
1000	11.03%	700
2000	8.22%	1000	2.77%
3000		2000	1.92%
4000	12.91%	5000	2.09%

Message Distribution Chart 1 Message Distribution Chart 2 Message Distribution Chart 3 Message Distribution Chart 4 Message Distribution Chart 5

Here is the same data from Apple bucketed such that each contains roughly five percentage points. These are the actual values used in the benchmark.

Level by Level Message Probability Distributions - Apple
Top Level		Level 1		Level 2		Level 3		Level 4
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
0	0.84%	0	32.83%	0	15.35%	0	10.21%	0	9.45%
5	4.90%	1	6.79%	1	8.21%	1	11.06%	1	9.64%
12	5.00%	3	6.89%	2	5.70%	2	6.40%	2	7.93%
22	5.07%	6	6.08%	4	7.83%	4	8.88%	3	6.06%
35	5.19%	10	5.27%	6	5.53%	6	6.48%	4	5.14%
51	5.01%	16	5.28%	9	6.08%	8	5.31%	6	8.62%
70	5.15%	25	5.03%	13	5.73%	11	5.71%	8	6.25%
95	5.16%	40	5.21%	18	5.19%	15	5.86%	11	6.61%
127	5.10%	65	5.01%	25	5.09%	20	5.28%	14	5.47%
165	5.09%	111	5.00%	35	5.07%	27	5.15%	19	5.95%
212	5.01%	212	5.01%	51	5.06%	38	5.27%	26	5.51%
274	5.02%	524	5.00%	77	5.06%	56	5.15%	36	5.12%
356	5.05%	2577	5.00%	126	5.05%	91	5.06%	55	5.11%
466	5.03%	3000+	1.60%	239	5.00%	169	5.01%	104	5.05%
623	5.02%			654	5.00%	462	5.00%	359	5.01%
855	5.01%			2000+	5.05%	1000+	4.17%	500+	3.08%
1232	5.01%
1922	5.00%
3275	5.01%
4000+	8.33%

Mailbox Distribution Profile

A mail server that supports IMAP is likely to support a hierarchy of several mailboxes (also known folders) in addition to the default INBOX mailbox for each user.? Below are several distributions to construct the structure of mailboxes contained within a mailstore supported by IMAP.? The data used is extracted from the four enterprise data samples (Mirapoint, Openwave, Sun, Apple).

The following tables show the probably of an individual user having a certain number of mailboxes (aka folders) at each level (depth).? The data reflects the probability distributions for the first five (5) levels, even though the actual samples went many levels deeper than that.?

Level by Level Subfolder Probability Distributions - Mirapoint, Openwave, Sun
Top to Level 1		Level 1 to 2		Level 2 to 3		Level 3 to 4		Level 4 to 5
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
1	34.9%	1	31.4%	1	43.0%	1	39.6%	1	36.8%
2	21.7%	2	12.4%	2	14.9%	2	12.6%	2	7.9%
3	11.6%	3	7.4%	3	9.1%	3	8.1%	3	39.5%
4	7.0%	4	5.6%	4	6.8%	4	10.8%	4	5.3%
5	2.0%	5	4.0%	5	3.5%	5	2.7%	6	2.6%
6	2.4%	6	2.4%	6	4.1%	6	7.2%	7	2.6%
7	1.5%	7	5.0%	7	2.0%	7	2.7%	8	5.3%
8	0.7%	9	5.8%	8	2.0%	8	0.9%
9	0.7%	10	2.6%	9	3.3%	9	1.8%
10	0.7%	15	7.4%	10	1.0%	14	3.6%
20	8.1%	20	3.2%	20	5.8%	15	0.9%
30	3.7%	30	7.2%	30	3.0%	20	3.6%
40	1.8%	70	3.4%	40	0.5%	25	2.7%
50	2.0%	200	1.8%	50	0.5%	30	1.8%
103	1.3%	246	0.4%	61	0.3%	42	0.9%

Level by Level Subfolder Probability Distributions - Apple
Top to Level 1		Level 1 to 2		Level 2 to 3		Level 3 to 4		Level 4 to 5
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
1	0.38%	1	37.28%	1	38.86%	1	41.69%	1	37.52%
2	0.71%	2	14.13%	2	17.28%	2	17.26%	2	23.47%
3	41.11%	3	12.26%	3	10.23%	3	10.82%	3	13.88%
4	17.15%	4	6.60%	4	7.07%	4	6.71%	4	6.94%
5	8.48%	5	5.41%	5	5.13%	5	5.30%	5	3.47%
6	5.59%	6	4.09%	6	3.69%	6	3.56%	6	3.64%
7	4.01%	7	3.14%	7	3.06%	7	2.51%	7	1.98%
8	3.24%	8	2.57%	8	2.18%	8	1.78%	8	1.16%
9	2.66%	9	2.08%	9	1.97%	9	1.74%	9	0.99%
10	2.04%	10	1.66%	10	1.77%	10	1.10%	10	1.16%
15	6.57%	15	4.97%	15	4.07%	15	3.79%	15	2.64%
20	3.28%	20	2.40%	20	1.94%	20	2.15%	20	1.32%
25	1.77%	25	1.17%	25	0.77%	25	0.64%	25	1.32%
50	2.49%	50	1.66%	50	1.66%	50	0.82%	50	0.33%
100	0.40%	100	0.38%	100	0.27%	100	0.05%
500	0.10%	500	0.17%	500	0.06%	500	0.09%
501+	0.01%	501+	0.02%					501+	0.17%

Folder Distribution Chart 1 Folder Distribution Chart 2 Folder Distribution Chart 3 Folder Distribution Chart 4 Folder Distribution Chart 5

The following tables show the percent of folders at each level containing any subfolders.

Level by Level Folders With Any Subfolders - Mirapoint, Openwave, Sun
Top to Level 1		Level 1 to 2		Level 2 to 3		Level 3 to 4		Level 4 to 5
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
0	59.0%	0	64.0%	0	80.0%	0	78.4%	0	97.4%
1	21.9%	1	20.6%	1	15.7%	1	14.4%	1	2.6%
2	7.5%	2	9.0%	2	2.3%	2	3.6%
3	3.3%	3	1.4%	3	0.8%	3	1.8%
4	2.0%	4	1.2%	4	1.0%	4	1.8%
5	0.4%	5	0.8%	6	0.2%
6	1.3%	6	1.0%
7	2.0%	7	0.6%
8	0.4%	8	0.2%
9	0.7%	9	0.2%
10	0.7%	10	0.4%
11	0.4%	15	0.4%
21	0.25%	19	0.2%
26	0.15%

Level by Level Folders With Any Subfolders - Apple
Top to Level 1		Level 1 to 2		Level 2 to 3		Level 3 to 4		Level 4 to 5
Width	Probability	Width	Probability	Width	Probability	Width	Probability	Width	Probability
0	94.58%	0	90.62%	0	92.14%	0	92.81%	0	95.25%
1	2.02%	1	3.65%	1	3.28%	1	2.70%	1	2.04%
2	0.77%	2	1.62%	2	1.36%	2	1.69%	2	1.10%
3	0.67%	3	0.96%	3	0.85%	3	1.00%	5	0.98%
4	0.36%	4	0.66%	4	0.53%	5	0.75%	10	0.63%
5	0.29%	5	0.48%	5	0.42%	10	1.05%
6	0.22%	6	0.35%	6	0.28%
7	0.17%	7	0.29%	7	0.20%
8	0.14%	8	0.20%	8	0.14%
9	0.11%	9	0.18%	9	0.14%
10	0.09%	10	0.17%	10	0.66%
20	0.58%	15	0.82%

Subfolder Distribution Chart 1 Subfolder Distribution Chart 2 Subfolder Distribution Chart 3 Subfolder Distribution Chart 4 Subfolder Distribution Chart 5

Mailbox Structure Example

Below is a walk through of the construction of a folder tree with a diagram to illustrate the use of the above distribution tables in creating a folder tree for user “U1”. The probability values used are only examples, not actual distribution table entries.

Folder Level Construction for User “U1” Example
Level	Next Level	Probability Computation	Diagram Representation
0	1	10.1% probability of 10 sub-folders	Create folders A1 through A10.
		7.2% probability of 2 folders having sub-folders	Mark folders A5 and A10 red to indicate presence of Level 2 sub-folders
1	2	6.3% probability of 7 sub-folders under A5	Create folders B1 through B7 under A5
		23.5% probability of 1 sub-folder under A10	Folder B1 under A10
2	3	5.4% probability of 1 folder under A5 having any sub-folders	Mark folder A5.B5 red to indicate presence of Level 3 sub-folders
		32.4% probability of 0 folders under A10 having any sub-folders	No subfolders under A10.B1
		35.8% probability of 1 level-3 sub-folder under A5.B5	Create folder C1 under A5.B5
3	4	56.8% probability of 0 folders under A5.B5 having any sub-folders	No subfolders under A5.B5.C1

The diagram below shows the mailbox structure for user U1.

Figure 1: Mailbox Structure Diagram

Mailbox folder structure for User "U1" with ten (10) Level 1 subfolers, seven (7) subfolers under A5, one (1) subfoler under A10 and only Level 2 subfolder A5B5 has one subfolder.

Peak Hour Determination

The overall peak traffic hour must be based on both SMTP and the corresponding IMAP activity over the same period of time.? Therefore, only two data samples were used to determine the relative workloads – Mirapoint and Openwave.? The other data samples did not provide corresponding SMTP logs for this purpose.

Peak Hour Traffic Volumes and Active Users

The following table shows the overall traffic volumes and users over the course of the peak day (determined by total number of message activity from the data samples).

Peak Mail Server Traffic – Enterprise
Data	Mirapoint Samples				Openwave Samples
Sample Hour	SMTP	IMAP	Combined	Unique Sender/Rcpt	SMTP	IMAP	Combined	Unique Sender/Rcpt
0	503	57	560		1169	5166	6335
1	571	60	631		1289	5435	6724
2	519	60	579		1033	4319	5352
3	456	60	516		1114	4210	5324
4	479	60	539		1158	4054	5212
5	503	63	566		1076	3777	4853
6	550	60	610		1108	3503	4611
7	869	103	972		1042	4566	5608
8	942	606	1548		1449	7383	8832
9	1198	1075	2273		2174	7315	9489
10	1029	2278	3307	90/160	2082	7247	9329
11	987	23015	24002		2217	6331	8548
12	874	2052	2926		2079	6186	8265
13	978	1507	2485		2120	7784	9904
14	1560	1235	2795		2818	8246	11064
15	1485	1119	2604		3809	10196	14005
16	841	783	1624		4846	10620	15466	1836/1836
17	803	541	1344		5665	9306	14971
18	502	466	968		5513	8504	14017
19	360	412	772		5125	6462	11587
20	316	249	565		4177	6260	10437
21	476	215	691		4440	6067	10507
22	377	218	595		4271	6133	10404
23	340	229	569		4004	6178	10182
Daily	17,518	36,523	30,039	238/168	65,778	155,248	155,248	2,254/3,000

The Peak Hour Percent of Active Users is computed by first using the larger of the two unique Sending or Recipient users and dividing that value by the total number of provisioned users.? The percentages are then pro-rated based on the relative number of actual users to compute the actual Percent of Active Users used in the benchmark.

Peak Hour Percent Provisioned Users
Company (Source)	Data Type	Number of Users	Percent PH/Prov
Mirapoint	Peak Hour	160
	Provisioned	269	59%
Openwave	Peak Hour	1,836
	Provisioned	2,299	80%
Normalized PH Percent Active Users			78%

PEAK_PCT_USERS = "78"

The Peak Hour Activity Percentage can be derived by using the traffic volume from the peak hour and the daily total for each protocol.? Again the benchmark value is computed by pro-rating each data sample within the overall user counts.

Peak Hour Percent Of Daily Traffic by Protocol
Company	SMTP	IMAP	Combined
Mirapoint	6%	6%	11%
Openwave	7%	7%	10%
Normalized PH Percent of Daily Traffic			10%

SMTP log files

SMTP Traffic Analysis

The SMTP log files reflect mail transfer agent workloads from the four enterprises, three collected over the course of fourteen (14) to thirty (30) days of operation and one from one day. The workload refers to all the requests processed by the mail server for delivering incoming and outgoing messages. These enterprises ranged from approximately 120 to 40,000 users.? The data logs cover the full 24-hour day, over the course of the data collection period.

The parameters used to describe the requests processed by the mail server are:

time stamp of the request
size [byte]
number of recipients

The table below shows the statistics for SMTP traffic flows and message sizes for the four enterprises.?? The ISP user profile statistics are included to illustrate the difference with the original user model.

SMTP In/Out-bound Traffic – Enterprise
Data Source	Percent Inbound Traffic	Percent Outbound Traffic	Average Message Size (KB)	Data Source Type
Mirapoint	85%	15%	24	Small company
Openwave	92%	8%	44	Medium company
Sun	98%	2%	23	Medium workgroup
Apple	Unknown	Unknown	105	Large corporation
SPECmail2009 Enterprise Model	93%	7%	101	Pro-rated medium/large company
SPECmail2008 Enterprise Model	93%	7%	38	Pro-rated small/medium company
SPECmail2001 (Dialup ISP Model)	53%	47%	25	Consumer Dialup

The following two tables contain the profile of the number of recipients per message, based on the Mirapoint and Openwave SMTP data from the busiest day of the week. The Apple SMTP sample did not distinguish between remote and local recipients, nor the actual recipient count from the RCPT TO step. So this version reuses the original recipient information. This data was extracted from both recipients named in the RCPT TO lines, as well as, recipient counts based on the mailing list expansions. The benchmark uses the probably distributions in the second table to generate the actual SMTP traffic.

Peak Hour SMTP Message Rate Comparison
Company (Source)	Data Type	Peak Hour Total Mesg/User	Daily Total Mesg/User	Peak Hour Mesg per Unique User
Mirapoint	Sender	11.4	73.6	8.7
	Recipient	6.4	65.1	6.7
Openwave	Sender	2.6	29.2	0.5
	Recipient	2.6	21.9	4.9
Normalized PH Messages Per User				5

Peak Hour From/To Analysis
Company (Source)	Data Type	From Local to Remote	From Local to Local	From Remote to Local
Mirapoint	Count	84	431	262
	% of Total	11%	55%	34%
Openwave	Count	195	789	429
	% of Total	14%	56%	30%
Normalized PH SMTP Message Flow		13%	56%	31%

SMTP Recipients per Message – Enterprise
Data Source	Minimum	Average	Maximum
Mirapoint	1	2.0	133
Openwave	1	3.3	74
Sun Microsystems	n/a	n/a	n/a
Apple	1	3.9	2061
SPECmail2009 Benchmark	1	3.8	2061
SPECmail2008 Benchmark	1	3.1	133
SPECmail2001 Benchmark	1	2	20

SMTP Recipients per Message Distribution
Recipients	Probability (Mirapoint, Openwave, Sun)	Probability (Apple)	Recipients	Probability (Mirapoint, Openwave, Sun)	Probability (Apple)
1	46.3875%	75.11%	13	0.10%
2	11.00%	8.03%	14	0.05%
3	9.00%	6.08%	15	0.05%	2.44%
4	8.00%	1.59%	16	0.05%
5	7.00%	1.10%	20	N/A	0.69%
6	6.00%	1.48%	25	N/A	0.57%
7	5.00%	0.61%	30	0.05%
8	4.00%	0.40%	50	0.01%	0.69%
9	N/A	0.34%	100	0.0025%	0.39%
10	2.00%	0.30%	500	N/A	0.14%
11	1.00%	N/A	1000	N/A	0.03%
12	0.30%	N/A	5000	N/A	0.002%

SMTP Recipient Chart

As stated above, the recipient distribution includes traffic routed through mailing lists.? The data showed that 7-15% of overall SMTP traffic was sent to a mail distribution list.? This data is included in the recipient distributions above, but described below.? The benchmark does not require creation of any distribution lists.

Mailing List Count Profile
Source	Minimum	Average	Maximum
Mirapoint	n/a	n/a	n/a
Openwave	1	12	58
Sun	n/a	n/a	n/a
Apple	1	18	2061
SPECmail2001	n/a	n/a	n/a

Mailing List Count Distribution? - Openwave
Recipients	Probability	Recipients	Probability	Recipients	Probability
1	15.2%	11	2.2%	22	1.0%
2	8.4%	12	2.5%	24	1.8%
3	5.5%	13	2.2%	25	1.3%
4	6.0%	14	2.3%	28	1.1%
5	6.9%	15	1.4%	30	0.8%
6	6.9%	16	1.8%	33	0.9%
7	4.4%	17	2.0%	40	2.5%
8	4.3%	18	1.1%	50	2.0%
9	4.8%	19	0.5%	60	4.7%
10	3.6%	21	2.0%

Mailing List Count Distribution? - Apple
Recipients	Probability	Recipients	Probability	Recipients	Probability
1	9.10%	11	5.09%	25	3.67%
2	6.70%	12	1.61%	50	4.43%
3	25.60%	13	2.89%	100	2.56%
4	5.62%	14	3.85%	500	0.89%
5	4.65%	15	1.74%	1000	0.20%
6	8.72%	16	0.58%	5000	0.01%
7	3.23%	17	1.10%
8	2.14%	18	0.77%
9	1.80%	19	0.95%
10	1.39%	20	0.72%

List Recipient Chart

SMTP Message Analysis

The SPECmail2001 method created a single level message that met a fixed message size distribution. Originally, SPECmail2009 attempted to follow the same criteria: generate messages according to the MIME distributions and then map these messages to the final SMTP derived message size distribution. However, these MIME distribution compliant messages did not comply with the SMTP log derived message size distribution. Analysis of the cause found the main reason for this size discrepency. The IMAP message samples form only a subset of the messages flowing through the e-mail system. The POP3 users on these same e-mail servers delete a substantial proportion of their messages, which does not keep messages on the e-mail server. Therefore, these missing messages did not contribute to the MIME definitions of the whole mail store.

Subsequent benchmark design choices prioritized message MIME parts structure and enclosure sizes over the derived SMTP message sizes. The SMTP log derived findings are presented here, but not used by the benchmark.

The SMTP log derived message size data differs between Consumer and Enterprise users. As with the earlier SPECmail2001, the size of each message is counted message size buckets. Overall, the average message size processed through the MTA increased from 24.5 KB to 38.57 KB, then to 101 KB.

Message Size Statistics from SMTP Samples (KB)
SPEC OSG Benchmark	Minimum (KB)	Average (KB)	Median (KB)	Maximum (MB)
SPECmail2001	1	24.5	2.5	2.7
SPECmail2008	0.05	38.57	3.56	21.0
SPECmail2009	2	105	6	139

The following tables describe the size distribution of all messages that flowed through the SMTP servers. This information is included here for completeness.

SMTP Message Size Probability Distribution
Message Size	Probability	Message Size	Probability
256	0.65%	16 KB	4.03%
512	6.46%	64 KB	4.25%
1 KB	17.50%	256 KB	2.39%
2 KB	31.90%	1 MB	0.87%
4 KB	22.47%	4 MB	0.32%
8 KB	9.12%	1 GB	0.04%

SMTP Message Size Probability Distribution - Apple
Message Size	Probability	Message Size	Probability
256		16 KB	11.30%
512		64 KB	15.43%
1 KB		256 KB	5.76%
2 KB	2.92%	1 MB	3.15%
4 KB	38.65%	4 MB	1.77%
8 KB	20.58%	1 GB	0.45%

SMTP Size Chart

The SPECmail2009 Enterprise message size distribution has shifted towards larger values. The original enterprise sample's median was in the 2 KB size. The newer data sample exists between 4 and 8 KB. A majority continues to be messages slightly less than 8 KB in size. However, SPECmail2009 creates significantly more large messages. For all messages greater than 16 KB in size, SPECmail2008 created about 11.9% compared to the 37.8% created by SPECmail2009.

SMTP Message Rates

The corporate SMTP samples showed the following characteristics.

SMTP Message Inter-Arrival Time
Mean (s)	Standard Deviation	Minimum	Maximum
2.80	2.37	0	15

SMTP Normalized Profile (Peak Hour)

Config Parameter	Value	Defnition
PEAK_PCT_USERS	78	Percent of provisioned users receiving messages in the peak hour (also known as 'Active users').
MSG_RECEIVED_PER_PEAK_HOUR	5	Number of messages received by 'Active users' in the peak hour
LOCAL_TO_LOCAL_PCT	56	Percent of total messages sent from Local users to Local users
REMOTE_TO_LOCAL_PCT	31	Percent of total messages sent from Remote users to Local users
LOCAL_TO_REMOTE_PCT	13	Percent of total messages sent from Local users to Remote users
PEAK_LOAD_PERCENT	32	Percent of the daily load occurring during the peak hour

Workload models

We have built a model for each of the parameters characterizing the SMTP requests.

Inter-arrival time distribution

The message inter-arrival time computation uses a simplified model because the total number of messages tends not to be enough to fulfill a complex distribution.? Therefore, the time between message delivery is computed as the total number of messages to be delivered over the duration of the load test run time, divided by that run time.

Inter-arrival Time = (Number of Active Users) X (Messages per User) X (Recipients per Message) / Load Test Time (s)

Message Construction

As described in the message size and MIME parts analysis, the benchmark chose to follow message structural and attachment size distributions rather than the total message size distribution used by the earlier SPECmail2001 benchmark.? In that benchmark, the email server tends not to care about the actual message MIME structure and recognizes just headers versus body parts.? IMAP4 email clients understand the concepts of attachments and expect the e-mail server to understand the various message parts.? This meant that the e-mail server must evaluate the actual structure of each message.? Therefore, message structure and individual attachment sizes affect the actual message size, since the MIME structural description is embedded in the message but not visible to most users.

The benchmark uses the above MIME Parts, MIME Part Sizes and MIME Depth distribution tables to construct each message stored in the mail store.

MIME_PART_SIZE = "64,0.40%; 128,5.18%; 256,2.28%; 512,6.37%; 1024,9.22%; 2048,18.00%; 4096,28.97%; 8192,11.37%; 16384,6.46%; 32768,3.91%; 65536,3.02%; 131072,1.88%; 262144,1.21%; 524288,0.68%; 1048576,0.45%; 2097152,0.60%"

Number of recipients

Unlike the Consumer ISP user model, the Enterprise user model Number of Recipients Per Message for is not overwhelmingly dominated by a single value.? The effects of the internal distribution lists shifted the mean (5) away from the median (1) value.? Also, mail distribution lists tend to be used inside enterprises, the maximum recipient count was allowed to be at one hundred (100).

MSG_RECP_DISTRIBUTION = "1,75.11%; 2,8.03%; 3,6.08%; 4,1.59%; 5,1.10%; 6,1.48%; 7,0.61%; 8,0.40%; 9,0.34%; 10,0.30%; 15,2.44%; 20,0.69%; 25,0.57%; 50,0.69%; 100,0.39%; 500,0.18%"

How to use the models

The models described in the previous sections can be used to reproduce the behavior of the real workload of a mail server. In particular, the use of these models is based on sampling the various distributions identified for each of the three characterizing parameters.

To sample the Weibull distribution obtained for the inter-arrival times, it is necessary to invert the function and to derive the inter-arrival time from the probability distribution. To make it short, let u denote a random number uniformly distributed between 0 and 1, the inter-arrival time t _i between the i-th and (i-1)-th request is given by:

t _i = a * (- log (u) ) ^1/b (***)

where log denotes the natural logarithm, a and b are the parameters of the Weibull function. The procedure should then start by drawing a random number u and by computing the corresponding value of t using the previous formula. Note that u should be strictly greater than 0. For u=1, the value of the inter-arrival time is equal to zero.

In the case of message size and number of recipients, it is necessary to sample the distribution obtained from the buckets. Again it is necessary to draw a random number uniformly distributed between 0 and 1.

Scaling issues

Scaling issues arise when the workload model has to be used to represent the load of mail servers characterized by a smaller or larger number of users. This is particularly the case of the arrivals of the requests whose rate depends on the number of users of the mail server. The SPECMail2001 benchmark assumed a linear behavior of the arrival rates, that is, the arrival rate of the requests of a mail server with 100,000 users is 10 times the arrival rate of a mail server with 10,000 users. However, the SPECMail2009 benchmark will never approach the lower limits mandated in the previous model.? There are just not that many enterprises with 10,000 employees, much less 100,000.? A second consideration is that the number of new messages arriving during the peak hour is only a very small portion of the each user’s mail store.?

Therefore, the number of messages arriving for each user is spread at regular intervals.? The overall SMTP work load is determined by the message count multiplied by the number of users, and divided by sixty (60) seconds.?

IMAP4 log files

The IMAP data have been collected from various mail servers at the University of Wollongong, Purdue University, Mirapoint, Openwave and Sun.? The measurements were collected for fourteen (14) to thirty (30) days.? From these log files, IMAP commands from individual IMAP sessions were grouped together and analyzed.

The parameters used to describe each IMAP session are:

time stamp
number of messages and mailbox size
number and size of messages retrieved/deleted within a session

The table below shows the IMAP command percentages generated during each data sample’s peak hour.

IMAP Peak Hour – Based on % of Daily Traffic
Mirapoint	Openwave	Purdue	Sun	Wollongang	Average
13.0%	??????? 6.8%?	8.6%	11.6%	13.3%	10.7%

The statistics refer to one working day (24 hours); we have observed that the behavior of the users is very different over week-end days.

IMAP Session Model

Analysis of the 10’s of thousands of individual IMAP sessions led to the conclusion that different IMAP e-mail clients use the different interaction models.? Furthermore, these models were more complex than the POP3 sessions.? Each IMAP e-mail client could initiate from one (1) to five (5) concurrent session(s), each with its own distinct session initiation patterns.?

This leads to a complex IMAP Session model, defined by the combination of two categories: client-type and command sequences.? A command-sequence is a series of IMAP commands performing one or more mail operations within a specific session.? A client-type is a collection of one or more command-sequences.???

The following table describe the criteria for each command-sequence.

Command Sequence	Client Software	General Characteristic	Comments
1	Netscape (Mozilla), Pine, Mulberry)	? Create connection?? ? Perform several operations using a variety of commands (probe folder for new messages, deleting, and moving messages, updating flags, list available folders, appending messages, searching for messages, checkpointing, etc.)? ? Occasionally probe folders for new messages ? Fetch headers if any messages arrived ? Occasionally fetch body (whole or parts of body) ? Focuses on a specific folder ? Does not log out session	This is one of the “primary” sessions that tend to stay logged into the IMAP server for many hours or days. Netscape uses UID commands, Pine and Mulberry do not. Probing folders is accomplished by: 1. Netscape: NOOP; UID FETCH n:* (FLAGS) 2. Mulberry: SEARCH UNSEEN; SEARCH DELETED; FETCH 1:m (FLAG ENVELOPE BODYSTRUCTURE, …) 3. Pine: NOOP
2	Outlook, Outlook Express, Mulberry	? Create connection?? ? Perform several operations using a variety of commands (probe folder for new messages, deleting, and moving messages, updating flags, list available folders, appending messages, searching for messages, checkpointing, etc.)? ? Occasionally fetch headers ? Occasionally fetch header and whole body ? Does *not* focus on a specific folder ? Does not log out of session	This is one of the “primary” sessions that tend to stay logged into the IMAP server for many hours or days. Probing folders is accomplished by these IMAP commands: ? UID FETCH n:* (UID, BODY.PEEK[HEADER], …) ? UID FETCH 1:n-1 (UID FLAGS)
3	Fetchmail, Outlook Express	? Create connection?? ? Fetch headers ? Fetch whole body ? Logout	These sessions are very sporadic and show dependency on results returned from Command Sequence 4.
4	Outlook, Outlook Express, Netscape - periodic or triggered actions	? Create connection?? ? Occasionally probe folders for new messages ? Occasionally issue other IMAP commands that does not alter the state of the mailstore (such as UNSUBSCRIBE or LIST)? ? Sometimes logs out, not always	These sessions show very automated behavior and are generated at fixed intervals for each user. Probing folders is accomplished by: ? Outlook 2002 – Inbox: UID FETCH m:* (UID, BODY.PEEK[HEADER], …); or UID FETCH 1:n (UID FLAGS) ? Outlook 2002 – Others: LSUB "" "*"; or STATUS "mailbox name 1" (UNSEEN);? ..; STATUS "mailbox name n" (UNSEEN); ? 2. Outlook Express: STATUS "mailbox name" (MESSAGES UNSEEN)
5	Mulberry, Netscape	? Create connection?? ? Occasionally list or probe folders ? Perform specific tasks, such as deleting, messages, or appending messages, etc. ? Alters the state of the mail store ? Logout	These sessions tend to focus on a specific set of tasks and then log out of the IMAP server.

IMAP4 clients will use one or more of the five (5) command sequences.? The IMAP4 benchmark emulates four (4) client types.? During the benchmark run, each of these client type threads represents a single user.? A client may connect one or more times to the IMAP servers.

IMAP Client Classifications and Sequence Map
Client Type	Component Command Sequence	Comments
1	1 4	These two (2) command sequences operate independently and concurrently.? Some of these clients will use message index number while others use the message UID.
2	1 4 5	These three (3) command sequences operate independently and concurrently.? Some of these clients will use message index number while others use the message UID.
3	2 3 4	Command sequence 3 IMAP commands and activities are based on the results from the other command sequences.
4	2 4 5	These three (3) command sequences operator independently and concurrently.? The message index number is used instead of message UID.

The compliant run uses the following combination to determine sequencing and dependencies.

CLIENT_TYPE_DISTRIBUTION = "1,31.373%; 3,32.353%; 4,3.922%; 5,2.941%; 13,3.922%; 14,10.784%; 15,1.961%; 24,0.980%; 34,2.941%; 45,2.941%; 134,0.980%; 145,3.922%; 1245,0.980%"

Each tuple defines the command sequence grouping (1 == CS1, 34 == CS3+CS4), and the percentage of overall load generator client threads that will implement each combination. The number of IMAP sessions varies as this matrix changes. Each load generator thread is assigned one specific combination.

IMAP Sample Selection

The extracted IMAP sessions were categorized according to command sequence types. However, some command sequences had an enormous number of sessions while others command sequence sample counts corresponded to the number of users. The fact that each client type uses more than one command sequence also forces the IMAP session selection criteria to gather all related sessions.

The final selection criteria used all IMAP sessions grouped by IMAP user name (found in each sessions’ login state). The resulting data set provides a more coherent model of not only individual primary command sequences (the premise of the SPECMail2001 benchmark) but also the number of related IMAP sessions and actions.

IMAP Command States

The IMAP command set allows many combinations of parameters and options. This means that a single IMAP command can perform more than one logical task, and on one or more messages at the same time. The best example is the FETCH command and its variant, UID FETCH. This single command has been used to retrieve not only the message body, but also message meta-data, headers and as a means to probe a folder for new messages. The latter (folder probe) is also complemented by the IMAP STATUS command, which provides a summary of old/new/deleted message.

The versatility of the IMAP command set leads to a need to expand the concept of a state from a simple command to the specific combination of a command and its parameters.? Included in this combination is the understanding of the number of messages encompassed by that command state, as well as whether it is against an individual, contiguous series or a disjoint set of messages.

IMAP State Codes tracked by Mail2009 Benchmark
State Identfier	State Name
1. ?	APPEND
2.	CHECK
3.	CLOSE
4.	COPY_NUM_FOLDER
5.	COPY_RANGE_FOLDER
6.	CREATE
7.	DELETE
8.	EXAMINE_FOLDER
9.	EXAMINE_INBOX
10.	EXAMINE_INBOXSENT
11.	EXAMINE_SENT
12.	EXAMINE_SENT_ITEMS
13.	EXPUNGE
14.	FETCH_NUM
15.	FETCH_NUM_BODYALL
16.	FETCH_NUM_BODYPARTS
17.	FETCH_NUM_BODYPEEK
18.	FETCH_NUM_BODYPEEK_HEADER
19.	FETCH_NUM_BODYPEEK_HEADERFIELDS
20.	FETCH_NUM_BODYSTRUCTURE_FLAGS
21.	FETCH_NUM_BODY_BODYALL_HEADERFIELDS
22.	FETCH_NUM_BODY_HEADER
23.	FETCH_NUM_ENVELOPE_BODYPEEK_HEADERFIELDS_BODYSTRUCTURE_FLAGS_INTERNALDATE_RFC822SIZE
24.	FETCH_NUM_ENVELOPE_BODYPEEK_HEADERFIELDS_FLAGS_INTERNALDATE_RFC822SIZE_UID
25.	FETCH_NUM_FLAGS
26.	FETCH_NUM_FLAGS_BODYPEEK_HEADERFIELDS_INTERNALDATE_RFC822SIZE
27.	FETCH_NUM_FLAGS_BODYSTRUCTURE_ENVELOPE_INTERNALDATE_RFC822SIZE_UID
28.	FETCH_NUM_RFC822HEADER
29.	FETCH_NUM_RFC822TEXT
30.	FETCH_NUM_UID
31.	FETCH_NUM_UID_BODYPEEK_HEADERFIELDS_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE
32.	FETCH_RANGE_UID
33.	FETCH_RANGE_BODYPEEK_HEADERFIELDS
34.	FETCH_RANGE_ENVELOPE_BODYPEEK_HEADERFIELDS_FLAGS_INTERNALDATE_RFC822SIZE_UID
35.	FETCH_RANGE_FLAGS_BODYPEEK_HEADERFIELDS_INTERNALDATE_RFC822SIZE
36.	FETCH_RANGE_FLAGS_BODYSTRUCTURE_ENVELOPE_INTERNALDATE_RFC822SIZE_UID
37.	FETCH_RANGE_UID_BODYPEEK_HEADERFIELDS_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE
38.	FETCH_SERIES_ENVELOPE_BODYPEEK_HEADERFIELDS_FLAGS_INTERNALDATE_RFC822SIZE_UID
39.	FETCH_SERIES_ENVELOPE_BODYSTRUCTURE_INTERNALDATE_RFC822SIZE
40.	FETCH_SERIES_FLAGS_BODYPEEK_HEADERFIELDS_INTERNALDATE_RFC822SIZE
41.	FETCH_SERIES_UID
42.	FETCH_UID
43.	LIST
44.	LOGIN
45.	LOGOUT
46.	LSUB_NULL_FOLDER
47.	LSUB_NULL_PART
48.	LSUB_NULL_SENT
49.	LSUB_NULL_WILDCARD
50.	LSUB_WILDCARD_WILDCARD
51.	NOOP
52.	RENAME_FOLDER_FOLDER
53.	RENAME_INBOXINBOXSENT_INBOXTRASHINBOXSENT
54.	SEARCH_ALL_DELETED
55.	SEARCH_ALL_RANGE_CHARSET_RFCHEADER
56.	SEARCH_ALL_RFCHEADER
57.	SEARCH_ALL_UNDELETED_UNSEEN
58.	SEARCH_DELETED
59.	SEARCH_RFCHEADER
60.	SEARCH_UNDELETED
61.	SEARCH_UNSEEN
62.	SELECT_
63.	SELECT_FOLDER
64.	SELECT_FOLDER_ITEMS
65.	SELECT_INBOX
66.	SELECT_INBOXSENT
67.	SELECT_INBOXSENT_ITEMS
68.	SELECT_SENT
69.	SELECT_SENT_ITEMS
70.	STARTED
71.	STATUS_FOLDER_ITEMS_MESSAGES_UNSEEN
72.	STATUS_FOLDER_ITEMS_UNSEEN
73.	STATUS_FOLDER_MESSAGES
74.	STATUS_FOLDER_MESSAGES_RECENT_UNSEEN_UIDVALIDITY_UIDNEXT
75.	STATUS_FOLDER_MESSAGES_UNSEEN
76.	STATUS_FOLDER_UIDNEXT
77.	STATUS_FOLDER_UIDNEXT_UIDVALIDITY_MESSAGES
78.	STATUS_FOLDER_UNSEEN
79.	STATUS_INBOXSENT_ITEMS_MESSAGES_UNSEEN
80.	STATUS_INBOXSENT_ITEMS_UNSEEN
81.	STATUS_INBOXSENT_UNSEEN
82.	STATUS_INBOXSENT_MESSAGES_UNSEEN
83.	STATUS_INBOX_MESSAGES_RECENT_UNSEEN_UIDVALIDITY_UIDNEXT
84.	STATUS_INBOX_MESSAGES_UNSEEN
85.	STATUS_INBOX_UIDNEXT
86.	STATUS_INBOX_UIDNEXT_UIDVALIDITY_MESSAGES
87.	STATUS_INBOX_UNSEEN
88.	STATUS_SENT_ITEMS_MESSAGES_UNSEEN
89.	STATUS_SENT_ITEMS_UNSEEN
90.	STATUS_SENT_MESSAGES_UNSEEN
91.	STATUS_SENT_UNSEEN
92.	STORE_NUM_SET_FLAGS_ANSWERED
93.	STORE_NUM_SET_FLAGS_DELETED
94.	STORE_NUM_SET_FLAGS_SEEN
95.	STORE_NUM_UNSET_FLAGS_DELETED
96.	STORE_NUM_UNSET_FLAGS_SEEN
97.	STORE_RANGE_SET_FLAGS_DELETED
98.	STORE_RANGE_SET_FLAGS_SEEN
99.	STORE_SERIES_SET_FLAGS_DELETED
100.	STORE_UNTILEND_SET_FLAGS_DELETED
101.	STORE_UNTILEND_SET_FLAGS_SEEN
102.	SUBSCRIBE_FOLDER
103.	SUBSCRIBE_INBOXSENT
104.	UID_COPY_NUM_FOLDER
105.	UID_COPY_NUM_INBOX
106.	UID_COPY_NUM_INBOXSENT
107.	UID_COPY_RANGE_FOLDER
108.	UID_COPY_RANGE_INBOX
109.	UID_COPY_RANGE_INBOXSENT
110.	UID_COPY_SERIES_FOLDER
111.	UID_FETCH_NUM_BODY
112.	UID_FETCH_NUM_BODYALL
113.	UID_FETCH_NUM_BODYPARTS
114.	UID_FETCH_NUM_BODYPEEK
115.	UID_FETCH_NUM_BODYPEEKALL
116.	UID_FETCH_NUM_BODYPEEK_HEADER
117.	UID_FETCH_NUM_BODYPEEK_UID
118.	UID_FETCH_NUM_BODYSTRUCTURE
119.	UID_FETCH_NUM_BODY_BODYMIMEALL_BODYMIMEPARTS_HEADER
120.	UID_FETCH_NUM_BODY_BODYMIMEALL_HEADER
121.	UID_FETCH_NUM_BODY_HEADER
122.	UID_FETCH_NUM_ENVELOPE
123.	UID_FETCH_NUM_FLAGS
124.	UID_FETCH_NUM_RFC822SIZE
125.	UID_FETCH_NUM_UID
126.	UID_FETCH_NUM_UID_BODYPEEK_FLAGS_INTERNALDATE
127.	UID_FETCH_NUM_UID_BODYPEEK_FLAGS_INTERNALDATE_RFC822SIZE
128.	UID_FETCH_NUM_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE
129.	UID_FETCH_NUM_UID_BODYPEEK_HEADER_FLAGS_INTERNALDATE_RFC822SIZE
130.	UID_FETCH_NUM_UID_BODYPEEK_RFC822SIZE
131.	UID_FETCH_NUM_UID_BODY_RFC822SIZE
132.	UID_FETCH_RANGE_UID_BODYPEEK_FLAGS_INTERNALDATE
133.	UID_FETCH_RANGE_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE
134.	UID_FETCH_RANGE_UID_BODYPEEK_RFC822SIZE
135.	UID_FETCH_RANGE_UID_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE
136.	UID_FETCH_RANGE_UID_FLAGS
137.	UID_FETCH_RANGE_UID_RFC822SIZE_BODYPEEK_HEADERFIELDS
138.	UID_FETCH_RANGE_UID_UID_BODYPEEK_HEADER_HEADERFIELDS_FLAGS_FLAGS_RFC822SIZE_RFC822SIZE_UID
139.	UID_FETCH_SERIES_UID_BODYPEEK_FLAGS_INTERNALDATE
140.	UID_FETCH_SERIES_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE
141.	UID_FETCH_SERIES_UID_BODYPEEK_RFC822SIZE
142.	UID_FETCH_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE
143.	UID_FETCH_UID_BODYPEEK_HEADER_FLAGS_INTERNALDATE_RFC822SIZE
144.	UID_FETCH_UNTILEND_BODYPEEK_HEADERFIELDS_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE_UID
145.	UID_FETCH_UNTILEND_ENVELOPE_FLAGS_INTERNALDATE_RFC822SIZE_UID
146.	UID_FETCH_UNTILEND_FLAGS
147.	UID_FETCH_UNTILEND_UID_BODYPEEK_HEADERFIELDS_FLAGS_RFC822SIZE
148.	UID_FETCH_UNTILEND_UID_BODYPEEK_HEADER_FLAGS_INTERNALDATE_RFC822SIZE
149.	UID_FETCH_UNTILEND_UID_FLAGS
150.	UID_FETCH_UNTILEND_UID_FLAGS_INTERNALDATE_RFC822HEADER_RFC822SIZE
151.	UID_SEARCH_ANSWERED
152.	UID_SEARCH_DELETED
153.	UID_SEARCH_FLAGGED
154.	UID_SEARCH_HEADER_QUESTION_RFCHEADER_UNDELETED
155.	UID_SEARCH_HEADER_RFCHEADER_UNDELETED
156.	UID_SEARCH_HEADER_UNDELETED
157.	UID_SEARCH_KEYWORD
158.	UID_SEARCH_NOTDELETED_UID_UNTILEND
159.	UID_SEARCH_RFCHEADER_UNDELETED
160.	UID_SEARCH_SEEN
161.	UID_SEARCH_SINCE
162.	UID_SEARCH_UID_NUM
163.	UID_SEARCH_UID_NUM_NOTDELETED
164.	UID_SEARCH_UID_RANGE
165.	UID_SEARCH_UID_RANGE_NOTDELETED
166.	UID_SEARCH_UID_UNTILEND_UNDELETED_UNDRAFT_UNSEEN
167.	UID_SEARCH_UID_UNTILEND_UNDELETED_UNSEEN
168.	UID_SEARCH_UNDELETED
169.	UID_SEARCH_UNDELETED_UNSEEN
170.	UID_SEARCH_UNSEEN
171.	UID_SEARCH_UNTILEND
172.	UID_STORE_NUM_SET_FLAGS_ANSWERED
173.	UID_STORE_NUM_SET_FLAGS_ANSWERED_DELETED_SEEN
174.	UID_STORE_NUM_SET_FLAGS_ANSWERED_SEEN
175.	UID_STORE_NUM_SET_FLAGS_DELETED
176.	UID_STORE_NUM_SET_FLAGS_DELETED_SEEN
177.	UID_STORE_NUM_SET_FLAGS_FLAGGED
178.	UID_STORE_NUM_SET_FLAGS_SEEN
179.	UID_STORE_NUM_SET_FLAGS_SEEN_ANSWERED
180.	UID_STORE_NUM_SET_FLAGS_SEEN_DELETED
181.	UID_STORE_NUM_UNSET_FLAGS
182.	UID_STORE_NUM_UNSET_FLAGS_ANSWERED
183.	UID_STORE_NUM_UNSET_FLAGS_DELETED
184.	UID_STORE_NUM_UNSET_FLAGS_FLAGGED
185.	UID_STORE_NUM_UNSET_FLAGS_FLAGGED_ANSWERED
186.	UID_STORE_NUM_UNSET_FLAGS_FLAGGED_FORWARDED_MDNSENT_DELETED_DRAFT
187.	UID_STORE_NUM_UNSET_FLAGS_SEEN
188.	UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED
189.	UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_DELETED
190.	UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_DELETED_DRAFT_FLAGGED
191.	UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_DELETED_FLAGGED
192.	UID_STORE_NUM_UNSET_FLAGS_SEEN_ANSWERED_FLAGGED
193.	UID_STORE_NUM_UNSET_FLAGS_SEEN_DELETED
194.	UID_STORE_NUM_UNSET_FLAGS_SEEN_FLAGGED
195.	UID_STORE_NUM_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED
196.	UID_STORE_NUM_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_DELETED_DRAFT_FLAGGED
197.	UID_STORE_NUM_UNSET_FLAGS_SEEN_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED
198.	UID_STORE_RANGE_SET_FLAGS_ANSWERED
199.	UID_STORE_RANGE_SET_FLAGS_DELETED
200.	UID_STORE_RANGE_SET_FLAGS_DELETED_SEEN
201.	UID_STORE_RANGE_SET_FLAGS_SEEN
202.	UID_STORE_RANGE_SET_FLAGS_SEEN_DELETED
203.	UID_STORE_RANGE_UNSET_FLAGS
204.	UID_STORE_RANGE_UNSET_FLAGS_ANSWERED_FORWARDED_MDNSENT_DELETED_DRAFT_FLAGGED
205.	UID_STORE_RANGE_UNSET_FLAGS_DELETED
206.	UID_STORE_RANGE_UNSET_FLAGS_SEEN
207.	UID_STORE_RANGE_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED
208.	UID_STORE_SERIES_SET_FLAGS_DELETED
209.	UID_STORE_SERIES_SET_FLAGS_DELETED_SEEN
210.	UID_STORE_SERIES_SET_FLAGS_SEEN
211.	UID_STORE_SERIES_UNSET_FLAGS_SEEN_FORWARDED_MDNSENT_ANSWERED_DELETED_DRAFT_FLAGGED
212.	UID_STORE_UNSET_FLAGS_SEEN
213.	UNSUBSCRIBE_FOLDER
214.	SEARCH_ALL_CALL_INFORMATION
215.	UID_COPY_NUM_
216.	UID_COPY_NUM_TRASH
217.	UID_COPY_RANGE_TRASH
218.	UID_COPY_SERIES_TRASH
219.	UID_FETCH_NUM_BODYPEEK_RFC822SIZE_UID
220.	UID_FETCH_NUM_BODY_RFC822SIZE_UID
221.	UID_FETCH_NUM_UID_BODYPEEK_HEADER_FLAGS_RFC822SIZE
222.	UID_FETCH_RANGE_UID_BODYPEEK_HEADER_FLAGS_RFC822SIZE
223.	LSUB
224.	SUBSCRIBE_TRASH
225.	UID_COPY_NUM
226.	UID_COPY_RANGE
227.	UID_COPY_SERIES
228.	UID_FETCH_NUM_BODYMIMEALL
229.	UID_FETCH_NUM_UID_BODYSTRUCTURE
230.	UID_FETCH_RANGE_BODYPEEK_HEADERFIELDS
231.	UID_FETCH_UNTILEND_FLAGS_RFC822SIZE
232.	SESSION_START

Functionally, there are many redundant states.? However, it was felt that the effort to use message unique identifier (UID) versus the variable message index number is significant. The UID remains fixed for the life of the mailstore. The message index number changes as the number of messages change. This means a message has many relative index numbers across multiple IMAP sessions. For similar reasons, it was felt that operating on a contiguous range of messages generates a different workload than a random set of message numbers.

The various FETCH commands against headers and body sections make more sense when divided according to command sequences and client types. Many of these are the other half of a folder probe, depending on the client type. The present or absence of MIME parts also factor into the construction of the final command, as these lead to further probes of individual MIME parts.

The actual state transition charts used by the benchmark is too complex for this document. The Architecture White Paper provides the named objects used in the SPECmail2009 source code.

Inter-arrival Time Distributions

The uncertainty or randomness associated with the arrival of IMAP commands to the server is modeled using a Markov model to specify the statistical relationships between commands as transition-probability matrices. Each command sequence has its own probabilities and probability distributions of specific states. All entries in these transition-probability matrices were derived from a subset of the data samples. The base inter-arrival transition matrix uses a lognormal formula similar to the one used for SPECMail2001. Please consult the source code for the actual values.

IMAP Scaling Considerations

Since each IMAP user represents a variable number of IMAP sessions and incurs a considerable amount of storage space, scaling has problems at both ends of the mandated range. A further complication is the actual distribution of command sequences, since some occur much more frequently than others.

A benchmark run with too few users runs into the problem of missing command sequences as well as compliance with the mail store folder structures, message size and count distributions. Experiments have shown that a minimum of 250 users must be used for a compliant run that meets folder and message structure distributions.

The other end of the scaling problem lies in the number of IMAP users and the fact that each user represents one or more concurrent IMAP session. The IMAP benchmark apparently supports fewer users per load generator, compared to SPECmail2001. But this is very misleading since the correct consideration should be the concurrent number of client sessions and their activity levels in these sessions.

The POP3 benchmark defines very short session times - on the order of a few seconds for at least 75% of all POP3 sessions that find no messages to retrieve. So dispite the large number of defined POP3 users, the 25% active users only log into the system four times during the peak hour. Furthermore, the typical POP3 session lasts only 2-5 seconds, executing at most 10 commands (25%), but usually three commands (75%) within each session. This means that only a small subset of users is actively connected at any one time.

In constrast, IMAP users are always connected and active. The number of IMAP users determines the minimum concurrent IMAP sessions that stay logged into the IMAP server for the entire peak hour simulation. The client type distribution values then determines the number of ancillary IMAP sessions that will be generated. This means the IMAP server should allow at least

4.5 X UserCount

concurrent client socket connections.