What will my transcript look like?


Although accuracy is the most important thing in any piece of audio transcription, presentation comes pretty close behind it – after all, no-one wants to read large blocks of text. Unless you have any special requirements, this is how a transcript from me will look:

I’m a moderator or interviewer.  I won’t be identified by name unless requested, but everything I say will be in bold text.

Caroline Jones:  I’m the interviewee or respondent.  I’m identified by my full name the first time I speak, but just by my initials afterwards.  Everything I say will be in plain text.

Thanks, Caroline.  Who else might pop up in a transcript?

CJ:  Sometimes I’ll have a speaker who’s only given his or her first name.  In that case, they’ll be identified by their full first name throughout the transcript, like Mike here.

Mike:  Hi, how’s it going?

CJ:  Audio files with up to three speakers will have speaker IDs in where identities are known, but for recordings with four or more speakers, such as focus groups, only the speakers’ genders will be given – M for male, F for female.

Doesn’t that get confusing if you have several speakers of the same gender one after another?

CJ:  Not really. Have a quick eavesdrop on this focus group.

M:  Well, I think chocolate should be banned.

F:  You can’t say that!

F:  If you did that, then I’d ban football.

F:  Not on your nelly, mate.

Okay, point taken.  Anyone else?

CJ:  Well, you might have brought a colleague or two along to assist you, like this person.

Hi, I’m a secondary moderator.  Everything I say will be in bold italics.  If I’m the only secondary moderator, I won’t have a speaker ID, but if there are a few of us, we’ll be tagged with ‘Int2’, ‘Int3’, ‘Int4’ etc. so you can tell us apart.  The moderator is Int(erviewer)1. We don’t usually pop up in large numbers, but sometimes we appear on panel interviews and suchlike.

So what happens if they’re all talking over each other or you can’t make out something due to background noise?

CJ:  For a single missed word, I’ll type (? xx.xx), where xx.xx is the time it occurs on the recording.  For more than one word, I’ll type (inaudible xx.xx) and if it’s more than five seconds of inaudible speech, I’ll type (inaudible xx.xx-xx.xx).  If I’m hearing something that doesn’t make sense or can have several spellings – for example, Kate, Cate and Cait all sound the same – then I’ll put (ph xx.xx) after the word, for ‘phonetic guess’.

Useful.  Anything else?

CJ:  That’s about it.  Just one more thing – I’ll put a timecode, like this (TC: 00:10:00) every ten minutes of audio, so if you need to listen to a particular bit of the transcript, you have a rough idea of where it is in the recording.

Thanks, Caroline, that clears up all my questions.