As I watched my Twitter stream flash past about a week ago (allegedly on holiday but never quite managing) I noticed ‘lets not pretend opendata is simple’ scroll up. I can’t remember who it was the comment came from but it rang very true for me. And here’s why.
I work for the digital team for a Communications team which is a shared service between a Local Authority and a NHS Care Trust. Part of my job is to upload content, and some of this content is data – opendata. I liked to think I have a reasonable grasp on such things, but as I walk the walk instead of simply talking the talk, I am discovering that things are not quite as simple as I anticipated them to be.
Take, for example, publishing the Trust’s over-£25,000 spend, a legal requirement for us since April 2010. We have all our csv’s nicely published, with an introduction explaining why they are in the format that they are in, and requesting nicely that if someone should have the inclination to do something ‘interesting’ with the data, that they please let us know. All very well and good. I have control over the website, I have friendly Communications Officers to write the blurb for me. All in my sphere of control.
What is not within my control, of course, is the signposting to the files which is kindly provided for free by the data.gov.uk team. And there are some teething problems. Once I have created my file pointers, initially, with all the url’s (web addresses) of the content I want them to publicise, I cannot edit any existing signposting, nor can I add any for the subsequent monthly submissions. The fantastic team at data.gov.uk, ably assisted by the National Archive team have been nothing but responsive, and I have submitted as asked all information requested in order to assist them in resolving the problem. Dealings have been smooth and courteous. But I still need to email them each month with the file data.
Then someone in the Finance Department emailed me the Trust’s accounts for 2009-2010. In a spreadsheet. With multiple interreferencing tabs, dependancies littered throughout as tab x depending on calculations in tab a and tab b for its end sum. If I am to stick to the opendata principle which I have set my store by, then I must pull that data out and put it into ‘machine readable format’. And so there I sat, copying and then pasting special through 30 pages, creating csv after csv file. Text only tabs went into pdf. They’re needed to understand the previous and subsequent tabs data that they refer to. Am I supposed to put that into csv too? How will the machine read that? How will the machine (or the script running on the machine) know there is reference material which needs to be referred to?
I’m left with 30 pages of csv, a few pdf’s and a list of unanswered questions. I want to be good. I want to be open. I want to be transparent. But oh my gosh, it’s not as simple as I thought it might be. Sometimes, you get given a file and you really do not know how best to proceed. And so, I come to this conclusion.
Collaboration, questioning and answering, and knowing who to question and having somewhere to record the answers, is going to be key to this whole determination to get it right. I don’t have the answers. Do you?
Louise, thanks for posting this. I think your experience is shared by others. I’ve only just come across this site, which is really useful.
I agree there needs to be more discussion about the often fiddly process of ensuring data is in a usable format, providing context to make it useful etc.
Sorry – I have no answers to your questions though!
Great post, and I share your pain in soooo many ways. I am a huge advocate of open data, but for those clamouring for it, they need to know that it’s not that straightforward and can require a lot of unpicking of data. And once all that hard work has been done, who’s to say someone will be bothered to do something interesting with it? As you say, going forward it’s also difficult. We’re struggling at WCC to keep the momentum going as we’re onto the new and next thing.
Hi Kate,
We’re experiencing similar things, I think. The ‘why can’t you just’ culture is meeting, in my case, local governement and NHS culture head on and the two are very different. Large amounts of the opendata drive have passed by the people actually producing the data, and so often what arrives in our inbox is inappropriate – and there is no understanding from the sender as to why.
I’ve suggested a Q & A hub over on dotgovlabs to try and get around this problem but I don’t know if it’s an answer or simply a patch over a wider issue.
Great points, very well made!
To be honest, very few of us in local government outside of those with some ICT interest even know what open data even means, let alone how they can help make it work. I work near our strategy, policy and performance people and they haven’t mentioned it once since I’ve been here, and that’s about five years now.
They are collecting data all the time and churning out spreadsheet after spreadsheet, but never once considering who else might need or want to see it and how things should be done. Until these people are fully up to speed I’m afraid it’s going to be more of that type of work for people like you!
Uncanny! A simple tweet (yes it was mine) and you somehow manage to extract all the same problems as I seem to have. Those problems aren’t technical stuff (although RDFa takes a bit of getting my head around and SPARQL is a nightmare) and it’s not for want of trying to get the data to publish. I take ‘We Love Local Government’’s point that sometimes IT people can get a bit silo-ish about this but in my experience there is a marked resistance from service teams to let this stuff out. And when it does come it’s a labour intensive process to get it formatted correctly. There are many LAs publishing ‘open’ data in PDF and although I don’t condone it at all I can fully understand why. Pragmatism.
The main problem I can see is that it takes too much time and nagging to convince service teams to release data and change systems/processes to accomodate the automated publication of data. Open data isn’t an IT thing – it’s a government thing. Mr Pickle’s push isn’t a technical one. If just one person in every directorate/service team in every council took an hour or so to read about what open data is and thought about how it might affect their service we’d be in a much better position. IT don’t own your webpages, IT don’t own your data. Why should IT be responsible for convincing you to make your data open?
Wow. I am not alone!
Open data champions, co-ordinated by central government, using data.gov.uk to ask questions on behalf of other people, giving advice and guidance on formats and generally being supportive and cheerleader shaped?
Doesn’t have to cost money, just needs online meetings every now and again, some online training delivered etc. It would help an awful lot.
OpenData is a great idea. But I believe the biggest challenge for OpenData is not so much the format, but what type of data should be published as OpenData.
TYPE must come before FORMAT.
Louise
I don’t know if it’s open data but this is what the accounting world is supposed to be moving to, albeit is the private sector first
http://www.xbrl.org/faq.aspx#31
XBRL is a language for the electronic communication of business and financial data which is revolutionising business reporting around the world. It provides major benefits in the preparation, analysis and communication of business information. It offers cost savings, greater efficiency and improved accuracy and reliability to all those involved in supplying or using financial data. It is an open standard, free of licence fees, being developed by a non-profit making international consortium. Other pages on this web site provide detailed information on XBRL, its technical features and its business opportunities.