By David Baron
This is an analysis of a jprof profile
of running jrgm's
tests [Netscape internal] on a build from 2001-11-08 (or maybe
11-07, I'm not sure) with my patch for bug
83836 in it, but no other changes. I used the default settings
(5 runs through the pages, allowing the browser to cache), except an
interval of 500ms. The profile was done with
JPROF_FLAGS
set to JP_REALTIME JP_PERIOD=0.002
JP_DEFER
. My machine is a single-processsor 1GHz Pentium
III with an IDE hard disk on the Netscape network (recently
upgraded), running RedHat Linux 7.1. The stack traces on my machine
occasionally seem to skip a stack frame here and there -- there are
some artifacts that are noticeable. It's probably because I'm using
gcc3.
I split up the profile using jprof's -i
and
-e
options. For each split done with
-ifunction
, -efunction
was
used for all splits following (except for two splits provided as
additional information). The numbers on the files indicate the
order of the splits, and the functions split on were the following:
0 | poll (not included in any profiles) |
1 | RuleProcessorData::RuleProcessorData(nsIPresContext*, nsIContent*, nsRuleWalker*, nsCompatibility*) |
2 | StyleSetImpl::FileRules(int (*)(nsISupports*, void*), RuleProcessorData*) |
3 | nsViewManager::Refresh(nsView*, nsIRenderingContext*, nsIRegion*, unsigned) |
4 | ViewportFrame::Reflow(nsIPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned&) |
5 | nsGIFDecoder2::ProcessData(unsigned char*, unsigned) |
6 | DocumentViewerImpl::Destroy() |
7 | js_GC |
8 | nsContentTreeOwner::SetStatus(unsigned, unsigned short const*) |
9 | nsDocShell::PersistLayoutHistoryState() |
10 | nsFontCache::GetMetricsFor(nsFont const&, nsIAtom*, nsIFontMetrics*&) |
11 | nsParser::Tokenize(int) |
12 | nsImageFrame::LoadImage(nsAString const&, nsIPresContext*, imgIRequest*) |
13 | nsCSSFrameConstructor::InitAndRestoreFrame(nsIPresContext*, nsFrameConstructorState&, nsIContent*, nsIFrame*, nsIStyleContext*, nsIFrame*, nsIFrame*) |
14 | nsCSSFrameConstructor::ConstructFrame(nsIPresShell*, nsIPresContext*, nsFrameConstructorState&, nsIContent*, nsIFrame*, nsFrameItems&) |
15 | HTMLStyleSheetImpl::SetAttributeFor(nsIAtom*, nsHTMLValue const&, int, nsIHTMLContent*, nsIHTMLAttributes*&) |
16 | nsScriptSecurityManager::CheckPropertyAccessImpl(unsigned, nsIXPCNativeCallContext*, JSContext*, JSObject*, nsISupports*, nsIURI*, nsIClassInfo*, long, char const*, char const*, void**) |
17 | nsGenericElement::HasMutationListeners(nsIContent*, unsigned) |
18 | nsStyleContext::GetStyleData(nsStyleStructID) |
19 | HTMLContentSink::AddAttributes(nsIParserNode const&, nsIHTMLContent*, int) |
20 | HTMLContentSink::CreateContentObject(nsIParserNode const&, nsHTMLTag, nsIDOMHTMLFormElement*, nsIWebShell*, nsIHTMLContent**) |
21 | CNavDTD::CanContain(int, int) const |
22 | js_LookupProperty |
23 | imgRequest::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned, unsigned) |
24 | XPTC_InvokeByIndex |
25 | XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) |
26 | _end |
27 | nsContentList::ContentAppended(nsIDocument*, nsIContent*, int) |
28 | PresShell::StyleSheetAdded(nsIDocument*, nsIStyleSheet*) |
29 | CSSParserImpl::Parse(nsIUnicharInputStream*, nsIURI*, nsICSSStyleSheet*&) |
30 | nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsAFlatString const&) |
31 | nsDiskCacheMap::DeleteStorage(nsDiskCacheRecord*, int) |
32 | nsContainerFrame::ReflowChild(nsIFrame*, nsIPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, int, int, unsigned, unsigned&) |
33 | nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) |
34 | nsParser::BuildModel() |
35 | DocumentViewerImpl::Close() |
36 | nsDocShell::CreateContentViewer(char const*, nsIRequest*, nsIStreamListener**) |
The complete list of the resulting profiles is:
00.html 01-StyleRes-RPD.html 02-StyleRes-FileRules.html 03-Paint.html 04-32-All-Reflow-except-GetStyleData.html 04-32-All-Reflow.html 04-Reflow.html 05-GIFDecode.html 06-DocViewer-Destroy.html 07-GC.html 08-SetStatus.html 09-SHistory-State.html 10-FontMetrics-ALL.html 10-FontMetrics-REMAINING.html 11-Tokenize.html 12-LoadImage.html 13-OtherFrameInit.html 14-OtherFrameCtor.html 15-HTMLAttributes.html 16-ScriptSecurity.html 17-HasMutationListeners.html 18-GetStyleData-ALL.html 18-GetStyleData-PARTIAL.html 19-AddAttributes.html 20-CreateContentObject.html 21-CanContain.html 22-js_LookupProperty.html 23-OtherImageDecode.html 24-CallsViaXPC.html 25-XPCInternal.html 26-XServer.html 27-ContentList-ContentAppended.html 28-StyleSheet-Reconstruct.html 29-CSSParse.html 30-EvaluateScript.html 31-DiskCacheMap-DeleteStorage.html 32-DeepReflow.html 33-XPCWrappedJSExecution.html 34-OtherContentSink.html 35-DocViewer-Close.html 36-Other-CreateContentViewer.html 37.html
Although splitting the profiles can help "blame" various modules, the split doesn't necessarily assign blame to the module at the root of the split. However, the splitting is definitely useful to make analysis of the profiles at a larger level more manageable.
Going through this list, the first relevant one is the complete profile. You can always go back to this one if you think something has been split out above a split that you're interested in. It's very difficult to make any sense out of this profile on a large scale. However, it can be quite useful for analyzing low-level problems, such as callers that do too much string appending. The total number of timer hits in this profile was 16651. (Note that a number of the stacks overflowed past jprof's limit of stack depth, 100, which I should really raise one of these days.)
Looking first at style resolution, 2.6% of the total was spent building up the
RuleProcessorData
for enumeration and another 6.6%
of the time was spent going through matching selectors (and all the
glue code in between).
8.9% of the total time was spent painting.
The "official" split (11.9%) covering
reflow turned out to be inaccurate because many of the hits in
reflow overflowed jprof's stack buffer. I took a later split (2.4%) to cover these.
However, I had split out GetStyleData
in between. This
means the best splits to look for for reflow are one covering all reflow (16.0%) and one covering
all reflow
except GetStyleData
(13.6%).
GIF decoding was 2.9% and other image decoding was 0.9%.
The document viewer's
Destroy
method was 2.9%. This was mostly
destruction of the frame tree and related objects.
Javascript garbage collection was 1.9%, and was dominated by marking. We currently force GC on page transitions.
nsContentTreeOwner::SetStatus
was 0.3%.
Capturing of state for session history was 0.3%.
The "official" split (0.8%) for font metrics initialization omitted a good bit of it, so it probably makes more sense to look at all of it (2.1%).
Tokenization in the HTML parser was 3.7%.
Handling of image loads was 8.7%, although this doesn't count the amount of time spent in poll waiting for them to arrive off the network (although this was mostly cached runs) because we kick off image loads from frame construction.
Other frame initialization (other than images) was 2.3%.
The remaining time within frame construction was 7.3%.
Script security checks took 2.1% of the time.
Checking for mutation
listeners took 0.6% of the time, almost half of it doing a
QueryInterface
on the window
object.
The "official" split
(1.0%) for GetStyleData
excluded a good bit of it, so
it's probably more interesting to look at all the time spent in
GetStyleData
(4.6%). Bug
109261 only accounted for 0.3% here, although it accounts for
more when the style system is used for the UI.
(Excluding things above) Setting attributes in the HTML stylesheet took 0.7% of the time. Other addition of attributes through the content sink took 2.9% of the time.
(Excluding things above)
HTMLContentSink::CreateContentObject
took 1.4% of the time.
CNavDTD::CanContain
took 0.3% of the time.
js_LookupProperty took 2.2% of the time.
Various calls from JS to C++ and the things they called took 3.2% of the time. XPConnect internals involved in making those calls took 1.5% of the time.
Things that connected to the X server (excluding painting above and anything else above) took 2.3% of the time (well, there's a tiny bit of other stuff in this split as well).
nsContentList::ContentAppended
took 0.7% of the time.
Frame reconstruction due to stylesheet loads (eek, should this happen?) took 0.3% of the time.
CSS parsing took 0.2% of the time (despite that our CSS parser is pretty slow, including its building up of data structures).
Evaluation of webpage javascript (inline, not through timeouts) took 1.0% of the time.
nsDiskCacheMap::DeleteStorage took 0.3% of the time.
Execution of JS code called from C++ took 1.9% of the time.
Other parser and content sink activity took 7.0% of the time.
DocumentViewerImpl::Close
took 1.0% of the time. Much of it was propagating
SetDocument
calls. (Do we still need these?)
Other things that
happened during nsDocShell::CreateContentViewer
took
0.7% of the time.
I could not classify the remaining 4.1% of the time.