As I watched my Twitter stream flash past about a week ago (allegedly on holiday but never quite managing) I noticed ‘lets not pretend opendata is simple’ scroll up. I can’t remember who it was the comment came from but it rang very true for me. And here’s why.
I work for the digital team for a Communications team which is a shared service between a Local Authority and a NHS Care Trust. Part of my job is to upload content, and some of this content is data – opendata. I liked to think I have a reasonable grasp on such things, but as I walk the walk instead of simply talking the talk, I am discovering that things are not quite as simple as I anticipated them to be.
Take, for example, publishing the Trust’s over-£25,000 spend, a legal requirement for us since April 2010. We have all our csv’s nicely published, with an introduction explaining why they are in the format that they are in, and requesting nicely that if someone should have the inclination to do something ‘interesting’ with the data, that they please let us know. All very well and good. I have control over the website, I have friendly Communications Officers to write the blurb for me. All in my sphere of control.
What is not within my control, of course, is the signposting to the files which is kindly provided for free by the data.gov.uk team. And there are some teething problems. Once I have created my file pointers, initially, with all the url’s (web addresses) of the content I want them to publicise, I cannot edit any existing signposting, nor can I add any for the subsequent monthly submissions. The fantastic team at data.gov.uk, ably assisted by the National Archive team have been nothing but responsive, and I have submitted as asked all information requested in order to assist them in resolving the problem. Dealings have been smooth and courteous. But I still need to email them each month with the file data.
Then someone in the Finance Department emailed me the Trust’s accounts for 2009-2010. In a spreadsheet. With multiple interreferencing tabs, dependancies littered throughout as tab x depending on calculations in tab a and tab b for its end sum. If I am to stick to the opendata principle which I have set my store by, then I must pull that data out and put it into ‘machine readable format’. And so there I sat, copying and then pasting special through 30 pages, creating csv after csv file. Text only tabs went into pdf. They’re needed to understand the previous and subsequent tabs data that they refer to. Am I supposed to put that into csv too? How will the machine read that? How will the machine (or the script running on the machine) know there is reference material which needs to be referred to?
I’m left with 30 pages of csv, a few pdf’s and a list of unanswered questions. I want to be good. I want to be open. I want to be transparent. But oh my gosh, it’s not as simple as I thought it might be. Sometimes, you get given a file and you really do not know how best to proceed. And so, I come to this conclusion.
Collaboration, questioning and answering, and knowing who to question and having somewhere to record the answers, is going to be key to this whole determination to get it right. I don’t have the answers. Do you?