Hi Christian,
    Thanks very much for the detailed and thoughtful reviews
of the paper "Log-normal durations can give long range 
dependence", by Jannig, Samorodnitzky, Smith and I, for the
van Eeden volume.
    We have finished a revision that can be found as the
files LogNorm2LRD2.tex and LogNorm2LRD2.pdf in the web 
directory:



    Here is a summary of changes we have made:

Editorial Comment:

Main Problem 1:  
  At the time we did this research, we thought that duration 
and size were basically interchangable, based on the 
reasoning "larger file sizes require more time".  Since then 
our research has revealed that this view is rather naive,
and in fact there is much to be learned from careful study 
of this relationship.  However, this gets really into 
probably two more papers (that are currently approaching the 
writing phase).  While the story is interesting, reasonable
treatment of it is too long to go into here.
  Now to address the problem, note that it is essential to
study durations in the views of Section 1 (there is no size
analog of these plots).  We considered replacing the size
analysis in Section 2 by the corresponding analysis of time.
However, this doesn't work because of the nature of the
data: basically there are a number of flows that essentially
spane the whole time range, making it impossible to study
"tails" for this variable.  This was the meaning of the
original first sentence of Section 2.
  We ended up addressing this by keeping the present
analysis, but by better explaining these choices at the
beginning of Section 2.

Main Problem 2:
  This issue gets into what I believe are allowable personal
views of what asymptotics are all about.  I understand the
reviewers point here, but have a different personal view.
I have debated such issues at length with others, and have 
concluded that we are never going to convince each other.  
It seems the best we can hope for is respect for each others 
views.
  The issue is simpler to discuss in the context of why one 
studies classical "n tends to infinity" asymptotics.  I have
heard opinions to the tune of modelling some type of
"increasing data" process.  However, for me asymptotics are
ONLY a mathematical tool for getting at simple structure
wich underlies complex phenomena.  I don't believe in
"increasing sample size" models, because I usually only have 
ONE set of data.  Hence, I am quite comfortable with other
types of asymptotics, e.g. sigma tends to 0 in regression,
that also are not easy for those wed to sampling models to
come to grips with.  I view the asymptotics in the present
paper as being just an extension of this idea.  The
objections raised here are in the spirit of pointing out
that there is no sampling model to which they correspond.  I
agree with that, but don't agree that this is a problem.

- Definitions:
  Good point.  Parenthetical explanation now appear where
these are first used.  "flow" was carefully defined after 
Figure 1, but sure some explanation was needed earlier.

- Visual Quality
  In my opinion, the main problem here is the constraint of
black and white graphics.  In my research, I routinely use
color, which does a great job of separating these.  But
yours is not the only forum to insist on black and white,
and I agree the past presentation (that was my first attempt
at this) was unacceptable.  The choice that I took was to
use "dotted lines" for the envelope.
  The confidence band suggestion is sensible, but my
personal opinion on the issue is that too many people tend
to take them too seriously (actually believing the boundary
is that precise).  I like the envelope because they reflect
the sampling variation in a way that conveys the
imprecision.

- Log normal simulation
  The exponential distribution was chosen not because it
fits well, but because of its history.  In particular, based
on successful experience with modelling the telephone
network, the first models used for the internet were based
on exponential distributions.  Because these models are so
entrenched in the large literature called "queueing theory",
it actually took a while for people to realize how
inappropriate the model actually is.  The whole point of the
visualization of Figure 2 is to show GRAPHICALLY how
inappropriate the exponential model is.  I understand the 
sense in which it would be more consistent with Section 2 to 
use the log normal, but in my opinion the graphic would no
longer make this important point, and indeed would look too
similar to Figure 1 to give any useful insights.

- Quality of fits
Yes, real data are often messy, and internet traffic
tends to be an extreme case of this type.  It is notoriously
hard to fit models to it.  There is some irony here in that
we have rather set ourselves up for this criticism by being
careful about the statistical analyis (through overlaying
the envelope).  Nobody else in the internet world does this
yet, instead just eyeballing ONLY the Q-Q plot and saying
"yup, seems to fit...".  For us, all models are just
approximations (and often not all that good an
approximation, as noted by the reviewers).  They are clumsy, 
but that seems to be the nature of the beast that we are 
studying here.

- Typo:  fixed



Ref 1:

Main discussion:  These points are well taken, if one
insists that asymptotics should follow some pre-conceived
model of "sampling".  But as noted above, the paper is
written in the somewhat different spirit of "asymptotics as
tool for insights only".  From this point of view, I suggest
that our asymptotics are properly formulated.

Specific Comments:

1.  Agreed.  This is fixed as noted above.

2.  Again fixed as noted above.

3.  Addressed as noted above.

4.  Right, one could use alpha for either the tail index, or
for the autocovarince (with a corresponding +-1 for the
other).  We have followed the convention that is most common
in the world of extreme values.  Indeed "polynomial" is
clunky when alpha is not an integer, but as noted, it does
provide the nicest duality with "exponential".  A
parenthetical description of the "~" symbol is now given at
the first point that it appears.

5.  Here is painful part of modern graphics:  something that
looks fine on the computer screen may not print too well. On
my screen, the envelope is nicely gray, but somehow it
becomes darker while printing.  This problem is now solved
by using dotted lines for the envelope.

6.  The fit is indeed poor in the lower tail.  The
suggestion of deleting that part of the plots is
interesting.  After some thought, we decided not to do it,
on the grounds that there is some value in showing the 
reader clearly how bad the fit actually is in the lower
tail.

7.  Answered above.

8.  Such plots are useful, and indeed will appear in an
upcoming paper.  We decided against including them here.

9.  Good point, this is now fixed in the abstract.

10.  I don't see this.  I guess the intended histogram is
about the aggregated traffic, but note that such a view
would completely miss the way that the duration distribution
enters to create long range dependence.

11.  Fair enough.  This is now changed to "particularly good
fit".

12.  Agreed.

13.  Right.

14.  THe Pareto distribution is very familiar in the extreme
value theory world, and thus also for internet traffic
researchers.  But we agree that for the broader audience we
have here, we should be more explicit.  But we decided to go
with the cdf instead of the density.

15.  This is now fixed.

16.  Thanks.

17.  That was done to keep everything on one line.  It is
now fixed by splitting the line.

18.  We have left this as is, on the grounds that giving the
value at this point is only distracting at this point.

19.  The point is that our main result does not hold
uniformly over lags, but only for "most lags", in this
sense.  We wer not able to think of a better way to explain
this.

20.  Thanks.



Ref 2:

General:

1.  Fixed

2.  Hopefully the present formulation is acceptable

3.  Addressed above.

4.  Point taken.  If we thought the only readers of this
paper would be statisticians, we could delete this.  
However, there has already been considerable interest in 
this paper by non-statisticians in the internet world, so 
for them it seems worth leaving this in.

5.  Addressed above.

6.  I am not sure I have understood the objection.  Yes,
there is a limit to how far our analysis went, and we tried
to be sufficiently mealy-mouthed about this in the original
version (e.g. "suggested" in the second sentenced).  The
only change I could think of here was "addressed" became
"considered".


Specific:
These are all fixed
