May18-2225-29
June1-58-1922-2629-3
July6-1013-1720-2427-31
August3-710-14
...and after that?

May 18 ~ May 22

Week 1

The first week back was mainly about setting goals for the rest of the summer. One major goal for me was shifting the focus of my work from the qualitative research more toward development (though still making sure to be on track with both by dedicating at least one day a week to qualitative work). The development is to be centered in constructing the tool, driven by our research.

Thus far, I have been maintaining a set of resource-user-tag data from Del.icio.us. Thus far periodically running my script has yielded a little over 4000 resource-user-tag data entries (more information is kept than that, but that is the identifying element). This is a dataset I intend to do analysis on, as it is a snapshot of information-seeking behavior specific to Lyme disease. It would be great to use it to find patterns that allow prediction.

At the end of the week, I became more and more interested in exploring topics specifically in information seeking. The literature survey thus far has been on algorithms about topic and cluster finding online; but I was lent a book on Information Foraging Theory (Priolli) and have had a host of related papers recommended, so I intend to, next week, jump into an intensive survey of literature more specifically related to various stages of information seeking online.

I also spent some time programming a MySQL database version controller (in PHP), which can be utilized pretty generally to move between different versions of set of tables within a database, with the goal that, when development is finished, all versions can be removed, and only one will be utilized with a reasonably painless removal of the version controller middleman. This was an important step for me to take on my own, since version control is not something that is built in, or something I'd ever made before explicitly, but is important, so construction was a nice way to practice manipulating the database with PHP.

It's really exciting to be back on this project. The best part is that I get to choose from so many things I could be doing, all of which I like. I don't feel overwhelmed yet, and I hope that I can contribute well. I am certainly learning a great deal already, and I cannot imagine that changing.

May 25 ~ May 29

Week 2

The main goal of this week was to survey literature on (1) information-seeking behavior, related to our research goals, and (2) machine learning techniques that might be useful. At the end of the week, I attended a fascinating (practice) tutorial in machine learning topics in natural language processing, which put much of the literature I had been exposed to in some needed perspective. Additionally, I was lent a book on Data Mining, which should also be useful, as I explore machine learning.

The del.icio.us dataset has grown, though not very much, since last week (not many things seem to be tagged with "lyme"), climbing to a lovely even 5000 on Sunday afternoon. (It is updated by running the script about daily.) During the course of the week, I wanted to use SPSS to analyze some of the existing data; unfortunately, I am not familiar with SPSS, and was not able to manipulate it as I had wanted. To rectify this situation, I borrowed a book on using SPSS, and hope to be able to use that. Additionally, I found some literature about analyzing tag information, just like what I have, using algorithms on graphs, so I intend to use some of those, as well, to both analyze and visualize the data. In the literature more theoretically centered around information-seeking I found some interesting things specifically about social tagging and what is, both practically and theoretically, a reasonable angle of analysis.

In short, I hope to learn a lot from applying what I have found in literature to the accumulated dataset (which will continue to grow, of course). I am very excited about this, as it seems like such a perfect - and meaningful - application of many concepts, including some of the clustering algorithms, machine learning concepts, and statistical analysis.

Not neglecting the qualitative end of the research project, I revisited the survey materials constructed in January (and piloted in the Spring) and composed an ad to be distributed, based on a finalized sampling scheme. The ethnographic study for this project has a survey, interview, and, potentially, a diary study component. The interview materials have been piloted in January, so it would be the optimal use of time to get the survey started as quickly as possible and go from there to get as much data as possible for analysis. The diary study has not really been explored much; the limitations, but interestingness, of the del.icio.us dataset indicates that a good potential diary study would be to use a del.icio.us - type tool to track an individual's research process. I am looking into how this might look more specifically.

We had our first group meeting this week; it was interesting, as I got to hear about other people's research to a greater extent of detail that I had before. There is another student that should be joining the chronic web project in the coming week, and I am really excited about meeting her and working with her.

June 1 ~ June 3

Week 3

I did a number of different things this week. First, I read a lot of the textbook on data mining that I had borrowed. Second, I developed a large part of a set of data gathering programs. They were based on a tool I had written in January, which was essentially a specialized site crawler, and the tool I wrote to get del.ico.us data. The major reason for developing a new set of programs grew out of needing to use it on much larger sets of data, as both of the original tools had issues not foreseen in their initial creation.

I also organized a number of the documents about the qualitative study for this project, and took a few steps toward designing a diary study and setting the interview process in motion.

Some of the other undergraduate students (from the Quality of Life Technologies program) arrived, including one student that I will be working with. As a result, the group meeting was stronger in numbers, and the graduate students told us some things about both the PhD program at CMU and local food options. Additionally, due to the influx of people, the desk space became a little of an issue, so one of the other undergraduate students and I went to IKEA to get a very small desk to improve the space use.

June 8 ~ June 19

Since I took a week-long vacation (from a Wednesday to a Tuesday) in the middle of the last two weeks, the remaining days will be referred to in this journal as "week 4".

This week, I did a lot of programming for the data gathering tools, as well as testing. One of the major bottlenecks of data collection is the fact that I am getting data from the web. This means that either I make a ridiculous number of http requests, which make a program that's supposed to crawl several thousand pages noticeably slower, or save them in memory - which is completely ridiculous, and causes the program to run out of memory in a matter of minutes. At first, I thought the slowness and inefficiency of making so many http requests was okay. However, one of my test data points had the program running for almost 14 hours before I stopped it. It should not take that long for one individual piece of data, when there will be hundreds!

So this brings me to the conclusion that I need to save http request results in a database, which is a whole new level of involvement for this particular data gatherer, and will take quite a little doing.

We also worked on making the journal/website for our project publically comprehensible. Since we do have a Drupal build, I really should spend time on that, as well, but organizing the existing journal has been quite time-consuming already.

June 22 ~ June 26

Week 5

Besides learning a bit of ActionScript 3.0 (still in progress), I have finished my data gathering program. Although it is very nice now and does more that it did before, and fantastically quicker, I feel very wary of the further development. Specifically, in the last week when I was intermittently working on the program for downloading online data, I managed to not only make it more comprehensive, but also make it run for an hour and a half where it would run for three or four days before. There were a number of things that have since been optimized for that I didn't think would make such a huge impact; more than that, there were also just mistakes lurking between the lines. Though I've tested quite thoroughly at this point, and have learned a vast amount of practical information about dealing with the Internet from a programming perspective, I feel like I have reached only the tip of the iceberg.

I have continued to read about research and work in related literature, at the end of the eek focusing on medical articles rather than usual computer science work. It is amazing to me just how rich of a picture all of it allows me to see. I've taken up diagramming the things that I read, but not all of them as it takes some time, and every time I make a diagram or read a paper, I come up with something that I want to ask of my data.

And in order to ask anything of my data, I must be able to run machine learning algorithms, and visualize the results - and that is the goal for next week. Watch out, data iceberg?

June 29 ~ July 3

For the first time, I write instructions for a program to be read by the layman. Who is a layman? Where does he lay? What does he lay? These, and real roadblocks I faced in the seemingly innocuous act of Going Beyond Documentation.

Having mainly written programs for myself or a very limited set of others, such as specialized data collectors that were not intuitive, but got the job done and, eventually, became easy to manipulate, I did not expect just how long and tricky it would be to write directions for an application to be used by people other than me, and, what’s more, people whom I couldn’t coach in person on installation and usage.

For the last few days, the machine learning / data gathering / development has been delayed, as I had to work on an application for the diary study we had designed. The application resembles (in that it lets participants to fill out a form we developed about each particular URL) a specialized, non-social implementation of a tiny del.icio.us. The actual application was not that difficult to create. It was made trickier by a desire to make it intuitive; but the most unexpected source of trouble were the “installation instructions” pages. They had to have screenshots of the application, doing things in each of the major browsers (Firefox, Safari, and IE).

Firefox and Safari were not too much trouble, other than the curious case of Mac OS print-screen, and the removal of the cursor from screenshots (though there’s applications, like XnView, which allow screenshots with cursors). Why is the cursor not in the screenshot? There’s plenty how-to's floating around the Internets explaining how to bypass this, but I have not found why this is not somehow built in to some native program. It’s okay to have to open a program rather than press some keys, but to have to find it, download it, re-start when it crashes and repeat as necessary until frustrated and give-uppy – I’m amazed.

Additionally, Internet Explorer 7 and 8 have completely different favorite links philosophies. One has a bar at the top, and a screen that you can pull out when you need it on the left, the other denounces bars and has a big thing on the left of the screen with the favorites. So I needed two sets of screenshots for IE. This would have been marginally annoying, but ultimately just fine, except that IE on my computer doesn’t do bookmarks – of any kind. It just says, “Unknown error occurred;” - despite the fact that I have not used IE since 8th grade, and this is literally out of the box. Okay then. At home, I have a desktop with IE8 and one of the other undergrads has IE7 on her computer – whew. Crisis averted, give or take more hours that I would ever consider necessary – let alone appropriate – to produce a dozen fairly bland PNGs.

In other news, I played Spore last weekend, and it blew my mind how they can render all those little elements, and have them interact, and look fantastic (vast majority of the time). It’s also very cool they have some work-in-progress programs on their site, and an API. I’ve never been much for computer games, playing with or learning about, but it’s a wonderful product, and it’s amazing and humbling in the apparently brilliant execution of what is a really, really complex idea. I wonder why it’s not as popular as it was when the original demo showed up on Google videos.

Between all the talk we’ve been having on this project about visualizations, I can’t help but think about the bottleneck. Would it be the actual act of producing the picture, or processing the input, or possibly even having to do some calculations? Kind of terrifying, actually, how hard it is to move beyond really bad visualization, or a really limited one, and how many beautiful (and surprisingly fast) things already exist.

July 6 ~ July 10

Hooray, participants!

After deploying the ethnography study, we've started to get some participants. That's really very exciting! I've set up a number of phone and IM interviews, and there was even some interest in the diary study application, which I discussed last week. Though the application itself was easy to create, the instructions and descriptions took me a lot of time and effort. I thought I had done a good job, but when some of the potential diary study participants responded with confusion, I wasn't so sure of that. And the most chilling part of this sort of thing is that, right now, I myself see no ways of substantially improving the instructions.

This Monday was a deadline for a poster session submission in an assistive technologies conference. The poster we submitted mainly had to do with the work we did over Spring term during the school year on this project, but looking at that part of the project was helpful for me to think about the timeline we have followed thus far and the timeline we'd like to follow.

In the spirit of this, we began outlining our expected contribution over the summer, and that was really helpful, as well, in formulating questions and methods to answer them. This feels really exciting.

July 13 ~ July 17

Week 8

This week started off with three interviews! This development was fantastic and exciting. It's really interesting to read over the transcripts, discovering things I hadn't noticed before. We've been working on a coding scheme to start with, on the basis of the pilot interview transcripts form January, so I'm really excited about using the coding scheme we've come up with, working to expand it, and then seeing what comes of it. Additionally, I've been doing some more programming of data collection. I have a massive number of files downloaded now - it's actually kind of scary to think of all the stuff we plan to do with those in terms of the qualitative analysis we outlines last week.

I've been getting to know some of the other undergraduates, and the masters/graduate students here. That has been really great; more than I expected to begin with. I especially enjoy our weekly group meetings. For example this week, we talked about (and acted out, briefly!) a participatory design exercise one of my mentor's graduate students was planning. That was very interesting, as we had had several discussions during prior group meetings about both this student's work (specifically her ethnographic contributions) and participatory design in general. I'm also fond of the half an hour or hour right after the meeting, when everyone feels energized, and tends to have completely unrelated to work, yet endlessly fascinating conversations. Sometimes out of those conversations I get ideas for future projects, or ideas relevant to current work - and it's really great to be a part of that kind of thinking process, especially while still eating lunch.

In other news, I got the Grace Hopper Conference scholarship: HOORAY!! It will be the first ever conference I attend, and I am very, very excited about it. My mentor was invited to give a talk this year, so I'm pretty excited about that, as well.

July 20 ~ 24

Week 9

We continued to work on gathering data. A lot of time goes into preparing for interviews and actually conducting them; though the other undergraduate student working on this project was starting to conduct the interviews, I was present at those.

Increasingly, issues have come up with the demographic representation of our participants, so we've been thinking about possible sample bias. The population we surveyed, and subsequently interviewed, is far from representative of the population in general, but it's unclear whether this is normal given the chronic illness - and specifically chronic Lyme, online-using population.

The diary study is showing to be a bit of a mysterious creature: of the (very) few participants using it, they are quite enthusiastic and have voiced no issues. On the other hand, a number of otherwise very forthcoming participants have had trouble with it. I'm not sure what the trouble is, though; I imagine it has to do with the installation procedure, which I admit is a bit complicated. I kick myself for not spending more time writing a better set of instructions, but maybe that wouldn't have made a difference anyway.

July 27 ~ July 31

Week 10

What really stands out to me from this week is a meeting that my mentor and I had with a medical professional who is working on this project. She was talking about how in the IRB protocols for the studies she is involved with, a standard requirement is that if a participant is voicing thoughts indicative of suicidal impulse or depression, then the participant's PCP is notified. Though our population is one defined by chronic Lyme disease, our IRB does not - and cannot without massive changes to the protocol - include that. For one, one of the features of our study is its anonymity: participants need not provide anything but their email address, much less their PCP. Especially individuals with Lyme disease, which is very disputed, would be extremely wary of a study that required disclosing such sensitive information. Indeed, many of our participants simply did not use names for the people involved in the stories they told.

In other news, a poster for a project that my mentor and I had worked on this past Spring was accepted to a conference on computers and accessibility, ASSETS. This is incredibly exciting - my first publication!! More about the study can be found on our website: here.

This weekend, I began the process of qualitative analysis by sitting down with a transcript (a print copy), cutting it up, and gluing it to pieces of paper based on categories. This could have been done without the paper and glue and scissors, but I don't think I personally could have done it without this preliminary exercise. It was extremely helpful, and I got a lot out of in terms of what appropriate segment size is for coding, and how to come up with categories an relate them.

I've been having some trouble keeping the files on this site up to date, and uploading my entries. Not technical trouble; simply the continuous realization that transcription takes a very long time and is very tiring. By the time I've achieved the day's goals, it's past midnight.

August 3 ~ August 7

Week 11

Most of the time this week was spent on finishing interviews that have been scheduled, transcribing some of the audio, and getting the rest transcribed by a company. Then, I began to create posters for each of the participants, which included all materials for that participant: survey, interview, and diary study (where applicable). There's about a lot more participants total right now, but a much smaller number who have been interviewed - it is that subset that the posters are for.

The thing about transcription is that it takes a very, very long time. I think, at the end of the day, I transcribed almost 10 hours of audio (rest were transcribed by a transcription service) - but that includes two hours from January. But it felt like so much more. Regardless, actually transcribing has been noted as a useful way of familiarizing yourself with the data, and besides, good tools make it much less painful, and actually quite fun (in terms of reading and understanding a document very closely). I wrote a little something in the research journal last week about tools I found useful for transcription: see that entry here

So in order to create the posters, I need to summarize the interview data effectively, which involves putting the survey and diary data next to a "profile." Having assimilated a reading of Interviewing as Qualitative Research by Irving Seidman, I followed this strategy (Note: This took me a long time. I began on Thursday and did not finish until the next Wednesday, so this entry kind of peeks over into the next week, too):

  1. Read the transcript and segment the participant's responses. The segments should be small enough to not carry too many parallel thoughts, but also big enough to avoid de-contextualizing a statement.
  2. Then, the segments should be categorized. Not the same as coded: only one category per segment. I did this by copy-pasting them into a Word document. The categories were somewhat consistent, but if something didn't fit into an existing category, a new one was made, or an old one rephrased. The idea was to be as "bottom-up" as possible.
  3. At this point, I had a number of very large Word documents, which I proceeded to summarize into a "profile." This meant going through the contents of each category and paraphrasing a lot of things. Something that happened was making each category at most a page long; this is helpful in printing out the "profiles," one category per page, and manually manipulating them. This allows a freedom computers simply don't possess: the freedom to surround yourself with this kind of data, and look at it all at once. I only did this for 5 interviews (small subset of the total)
  4. Then, for every interview, I took either the profile where possible, or the categorized interview, and created a "Brief." The brief had several sections, which I determined to be the smallest number of questions that could be answered by the interview data to distinguish every participant from every other participant. The questions were along the lines of: So the answers to those involved summaries and quotes from the interview. Needless to say, that's a very short summary of what was often a 20-page document, but ultimately it was a great exercise to make those, and it certainly helps in distinguishing each participant and is thus a great reference point for this dataset. It's also nice that the Brief profile is only a page, sometimes less, as the answers to those questions can be quite short in summary form.
  5. Armed with print-outs of a brief, a pre-interview questionnaire (a short survey which asks, "what kind of online/offline resources do you use for giving/getting information/support?) and/or a list of resources determined from the interview (in several cases, the questionnaire was blank or missing), and a survey for every participant (except two of the pilots, who did not complete the survey), I created a poster for every participant and put them all over the walls of my mentor's office (she did expect this.)

And that's what I did this week.

August 10 ~ August 14

Week 12

So, the first half of this week was spent actually finishing up the stuff from last week - namely, making posters for participant interviews. A pretty detailed explanation is available in the journal entry for last week.

Then, we had a meeting where I hung up the posters, and other researchers - faculty on this project, including my mentor - walked around the room and were able to see all of the data we collected (in deeply summarized form). It was overall a very intense, long, and fruitful meeting, wherein we picked a concrete direction for the CHI contribution this year. A much more concrete one than before.

The following day, another student - and entering Master's program student at HCII - expressed interest in this project! So he and I met and talked at length about where the project currently stands, what kind of work has been done, and how he can contribute. It's nice to have more people working on this project.

As part of the general effort at the end of this week to organize and summarize protocols and materials, I put together a 11-page Interviewing Guide, which included the full interview protocol, the pre-interview questionnaire and related topics, and an interviewing reading list and FAQ. Among other things, it's helped me realize that I've polished an unexpected, yet valuable data collection skill this summer - communicating with, and interviewing participants.

August 17 ~ September 19

Week 13 and on!

As the summer wrapped up, I focused my efforts on finishing anything that needed to be done while still at CMU - such as meeting with a master's student who was going to be working on the project over the semester - and having some final in-person discussions of the data.

When I left to go back to school (yay, Obieland, how I missed you!), however, I was not really leaving the work. Indeed, since we were working to submit to CHI, I knew I was going to spend a lot of time on this project alongside my classes. And this I did.

Much of the grounded theory analysis took place on my computer, via email, and conference calls. It was a stressful several weeks, but the overall result was rewarding.

The deadline is past; I have had time to recuperate. I have now given a talk at Oberlin, my home school, on this research, and prepared a poster for the Grace Hopper Conference. Then, I plan to continue working on this project with Jen Mankoff remotely.