mea8: web metrics

gotta take good notes this time: Susan asked me to fill her in while she’s in the XML forms presentation.

it’s a panel (3 women, including the last presenter I saw, and one guy, for those paying attention), 4 web people from new york state. they’re hoping for answers from us, too. almost everybody here is working with some sort of metrics.

phrases & meanings are fluid; not just server logs (and what you can/can’t do with them) but other kinds of measurements.

dictionary-style definition of metric. usually quantitative, should always be consistent in both method & interval. possible distinction between metrics & analytics.

why use? is anybody visiting the web site?! improve understanding of user behavior. (slides have the points she’s covering) something to deal with problem people: “important” site with only 3 visitors. 🙂

what is server log? recorded http requests, can get very big. not particularly human-readable. empire state uses extended format. includes every object: a series of hits for a single page, one for the html, for each of the images, stylesheets, etc.

specific view of what is in a specific item. rfc931? for multiple domains on the same server (she didn’t know; was a comment from the audience!). IP, authuser if any, date, the actual request (page), code (was it successful. 200 is good.), files size in bytes. that’s the common log, which is what we used to have. IP is what’s used by programs to determine visitors, date is used for determining sessions (in combo with IP).

extended includes…referrer (if clicked on link from previous page; blank if typed in directly or via bookmarks. particularly useful for search engine terms…which I mentioned in my presentation re: Google used internally.), user-agent (browser/OS), processing time (to render), cookies if any, translated URL (is that specific to domino? what other fields are used by specific servers?)

example from WebTrends (what I use, tho I hve an old version), that’s a familiar looking report.

geneseo, talking about WT, getting people to understand difference between hit & page view. pretty graphs. some faculty are very popular: one is in top 20 for last month! can see use of authenticated sites. she’s been able to do it excluding on-campus computers, which is something I’ve had a hell of a time with.

problem with exploit on Windows/IIS related to errors that show up in webtrends.

browser reporting reliability? she doesn’t find it very reliable, but is just going with standards.

filtering for spiders/webbots? can be done, but not in this report. I should try that, see if it makes a difference. educating users about that (googlebot!) and the hits/page views issue.

deceptive results. (I really want to find that article from the analog guy, but I’m having bad google luck.) spider/bots. distinct sites as separate reports. big sites can eat up the results of reports. they (the guy) breaks reports into internal/external, not perfect (fac/staff @ home) but useful. so much do I need to do that to deal with the PY labs effect.

other deceptive thing is cache. log should be treated as a sample.

unique visitor stuff is wildly unreliable. AOL & Earthlink in particular, because of dynamic IP. same deal with time, length of visit. authentication is only server-based authentication, not scripting-based.

q: filtering on-campus, addtl filtering for specific machines? moving target, having to assume that the data isn’t complete, etc. treat it as relative. one person has subdomains for parts of campus, so they can filter labs, etc. audience member talks about not using filtering, because of constantly changing info. response: helps to keep filters consistent.

comment from aud: accuracy of timing. apparently Urchin can use js to do more specific user tracking. for authentication, can set in header of PHP, etc. (well, that makes sense. I don’t know what I’d *do* with it, but it’s interesting.) comment from woman from Buffalo: one of many tools. also, don’t use for precise measuring.

can also make primitive homemade tools for analyzing specific terms.

pause while switching screens. (guy from Hamilton) narrow tracking: campaigns. most success with analyzing small bits of site over short periods of time.

case study of stories on home page using mouseover sillhouetes (sp?!): theories about how they were used, which ones get clicked on? they ran reports, page query terms (because they were generated dynamically from the database), created homegrown tool to do comparison. no difference in visits because of shapes, colors, highlighting: couldn’t predict which, so it looked like people were mousing over all & picking ones that interested them. surprising was middle-of-the-road stories: ordinary students doing well. not sure they capitalized on it.

case study: navigation. had tab-based, slightly different for portal users, on-campus but not logged in, and off-campus. lots of complaints about navigation. looked at DHTML menus to get rid of tabs, problems with the library people 🙂 (“you should be happy with being on the academic services page”), they have very short menus. very little off-campus use of the library (but they are a live-in college in a small town). click thru analysis of their home page. athletics site jumped way up when added to home navigation. people are going deeper from the home page into the site, in particular in the admissions area.

q: is it easy to track clickthroughs in Urchin? complications because of their use of tech. follow page at a time. he thinks there’s a lot of garbage in it! comment from audience that there’s a specific report for that, but it’s inaccurate.

q: menus accessible? degrade gracefully (and I just tested it myself, works well).

search results.

internal searches. he uses a homegrown search engine (they are CF people), query gets dumped in database then goes through to webinator. oncampus: my hamilton, library, blackboard, maps, email. originally didn’t link to their portal, and searches for the term went way up, so they added links. like us with the blackboard stuff. their search includes both their directory and their web search.

(where the hell did my quicktags go?)

off-campus search engine referrals. some weird stuff. “tasty d lite” and monopoly instructions?! I mentioned our discovery about using Google for internal searching, someone else mentions a staff page that people still ask for a lot. other colleges with the same name!

q: what about those weird pages, can you do anything to make them lower? it’s an academic page, so they’re hands off. would like to have more effect on ordering with subsites, but never has the time.

(okay, now the mocha’s wearing off. and my brain is broken.)

empire state was able to capture unsuccessful results, but bug in method? something mysterious that happened in a server upgrade that they fixed. becoming an interesting/helpful tool. how much is misspelling? (thank goodness for google!) found a lot of acronyms she didn’t know about. (military people!) “they can bloody well learn to spell”? but what about people like Elizabeth, for whom google’s respelling function is a lifesaver.

more problems with metrics stuff. case-sensitivity problem with Apache (Unix) vs. IIS (Windows) and Urchin, showing both versions (upper/lower) as separate results. empire state says almost all programs are case-sensitive, her server admin did something to make the whole log lowercase.

q: webtrends is so expensive! she’s just using a single-user version (under $1000). comment: reason why switched to Urchin. (that’s why I haven’t upgraded it in nearly 3 years; I wonder if that’s why my filtering problems are so frustrating.)

guy in front of me is using Deep Log Analyzer.

q: how do you explain to users how to use the reports? one-on-one meeting to show them the reports and explain them. one of buffalo’s training sessions is on how to use reports. (Urchin)

comment: live-time server version of webtrends crashed their web server; now they’re looking into a 3rd party system (web side story) that doesn’t use server logs, priced by pageview. (live-time seems like overkill to me.)

comment: explaining to users about sitename/ sitename/index.html etc. being the same thing. they’ve had “interesting adventures” with that issue too.

geography stats are crap. (back at UWPC, 60%+ of our traffic looked like it was coming from virginia)

I missed something there because of a sidebar discussion re: the geography problem.

are we taking a break? need to stretch.

other data sources: missed the list. but I can guess.

other @ geneseo: survey, webmaster email, helpdesk. just starting to look at helpdesk questions. (in our case, what about the receptionist? helpdesk? student services?)

survey related to portal project, both what they’re doing now and what they want in the future. they had great turnout. search is not good; students asking for portal-type functions. long survey. their navigation is successful, lots of suggestions re: graphics, etc. going to faculty next week. can get results from her! using a survey tool, the one from VT. I’m noticing that lots of people are doing stuff with java/tomcat. I wonder if that’s because of the whole uPortal thing. (why isn’t the wa cc system looking at that instead of that crazy heidi project thing?!)

contact form like ours, but they have radio buttons to do some introductory funneling. uses a perl script to mine the email for common things, using it to generate a FAQ until portal is up. example: how to get a transcript. we had the same issue, and I found it the same way. she doesn’t get those emails at all any more; I still get some, but way less than we used to.

comment: use the form to actually give them the data right there.

q: what about the text boxes? we’ll just have to read. (sounds like our survey….20 pages of comments. but y’know, that was incredibly useful.)

comment: likes surveymonkey.

buffalo metrics toolbox: also uses usability studies, focus groups.

(can I make it coherently through this last halfhour? I feel like I’m melting down.)

partnered with psych faculty for focus groups!

keeps all the webmaster mail in a database. (her voice is too quiet to not be miked.)

uses Zoomerang for surveys.

took a stretch/bathroom break. woman from buffalo is still talking; I think it’s something about analyzing their search results (ultraseek?).

they do usability studies en masse, a bunch of students in a lab. uses a questionnaire. (I should ask her about that when I write for the training info.)

again, focus groups are done with faculty moderators. interesting, much like Lynn running our card sorts. they do sessions with both fac/staff & with students. more qualitative, get the aha! moments. faculty tabulated. I wonder if we could do that with business/marketing faculty, or even students. also a good way to identify stakeholders.

q: were focus groups recorded? all taped (audio?), also with the followup reports.

call for general questions: am I the only one out of it?

q: how do you get to that google stuff? I talked about how we saw it in our webtrends reports, also availability thru google. somebody else said that the referrer address includes which are which (internal/external).

I’ve missed a few things looking at stuff on our site.

comment: searching is a predominate finding behavior. help them search!