Summary of results of Extreme Index analysis
of UNC HTTP response size data.

Output files are in the web directory:

http://www.unc.edu/depts/statistics/postscript/papers/marron/NetworkData/ExtremalIndex/


of the form:

UNC2001RS?ExtInd1T##ssss***.ps


where:


? is either one or two (depending on the calling
program, not relevant here)


## indexes the time block:

11:  Thursday Afternoon (peak time)
19:  Sunday morning (off peak time)


sss reflects the variable being studied:

siz:  response size, in bytes
tim:  response time (duration), in seconds
rat:  average rate of response transmission, just siz/tim,
        in bytes/sec
irat:  inverse rate, tim/siz, sec/byte.  This is useful
        for studying rates in contexts where times are
        large.


*** shows how the data have been truncated:

nothing means that all response with both a nonzero size,
and also a nonzero time (duration) are included in the
analysis.

10k means that only responses with size > 10k are included.

100k means that only responses with size > 100k are included.



Here is some discussion of the results.  Recall that the
Extremal Index being studied is essentially "1 / expected
number of times extrema occur together".  So an extremal 
index of 0.5 suggests that extrema occur in pairs.

First the peak time of Thursday afternoon is studied in
detail:

1.  Response sizes:
UNC2001RS2ExtInd1T11siz.ps
UNC2001RS2ExtInd1T11siz10k.ps
UNC2001RS1ExtInd1T11siz100k.ps
    The first two are rather similar, with a fat peak whose
height is about 0.9, suggesting that peaks in sizes do not
tend to cluster.  The third has a somehwat lower height.

2.  Response times (durations)
UNC2001RS2ExtInd1T11tim.ps
UNC2001RS2ExtInd1T11tim10k.ps
UNC2001RS1ExtInd1T11tim100k.ps
    Here the peak heights are smaller, but only slightly so,
compared to the above time plots.  However, very noticeably,
is that for smaller threshold proportions (see the upper
left plots), these fall off very substantially, suggesting
that the largest response times tend to occur together (and
much more frequently than the corrseponding sizes).

3.  Rates (size / time)
UNC2001RS2ExtInd1T11rat.ps
UNC2001RS2ExtInd1T11rat10k.ps
UNC2001RS1ExtInd1T11rat100k.ps
    Here the amount of clustering seems to depend strongly
on the threshold.  Any ideas as to why?

4.  Inverse Rates (time/size)
UNC2001RS2ExtInd1T11irat.ps
UNC2001RS2ExtInd1T11irat10k.ps
UNC2001RS1ExtInd1T11irat100k.ps
    Here the jump from no thresholding to 10k shows a
similar peak height, but the largest values tend to cluster.
But at 100k, this effect disappears, but the general peak 
goes down to only 2/3.

Note that generally rate follows size (as expected, since
large sizes drive rate), and inverse rate follows time
(again driven by large time responses).



Sunday mornings:  the big picture lessons are similar:

a.  sizes and rates tend to suggest no clustering.

b.  times and inverse rates do feel clustering influences.

c.  the effect in (b) is strongest for the highest
thresholds (i.e. the largest values).

d.  differences become smaller when the data are resitricted
to 100k.



So overall I think the lesson is that sizes and rates tend
to see less clustering than times and inverse rates.  This
effect is strongest at the highest thresholds....



