Thursday, 27 February 2014

PPUK Reform

Overview

The Pirate Party UK (PPUK) is a political party in the UK, who typically fight for civil liberties, copyright reform and a more transparant government. Full disclosure, I'm a full, membership-paying member of PPUK.

This is not a reform proposal in the traditional sense. This is a look at how PPUK acts and is perceived, and if that can be altered to be more of a positive. Think of it as a perspective reform.

Where we're at

Historically, we've taken on campaigns at both the local and national level. We should keep on doing this, as both levels are required to effect our goals.

However, our campaigns have almost always been in opposition to something, for example, we opposed the withdrawal of Legal Aid, we opposed the detention of Chelsea Manning, the "snooper's charter", and so on.

When it comes down to it, we expend a disproportionate amount of energy saying "no" to our opponents. We often celebrate in the defeat of our opponents. This gives the impression to the public (well, those that know about us, that's a whole other blog post) that we are adversarial, and against progress.

We need to acknowledge that the stated goals of these schemes may have merit, bring substantial benefit to the public (ex. care.data), and even that they may align with our goals -- but that the proposed implementation has issues; we need to offer alternatives which still support the goal.

Further to this, we don't provide anything to the public -- no tools for activists, we currently have little in the way of getting up and running in a local area, how-tos for running a campaign, nothing.

The Future?

I feel that, once we're a larger organisation, it would be most helpful for us to conduct ourselves in a way that is consistent with the goals and principles laid out in our manifesto. Beyond that I feel that we should be aiming to improve our society.

I think that this means, instead of simply opposing problems, (e.g. pharmaceutical greed, violation of civil liberties, invasion of privacy, etc.) we must support alternative solutions.

The simplest one from that list is the invasion of our privacy -- we can support projects that improve people's privacy online (e.g. HTTPS Everywhere, TOR, GnuPG). We can provide infrastructure, many of our volunteers have skills they could offer (programming, design, UX, etc.), providing them with more public support and exposure, building our own complementary tools, educating people in their proper usage, and so on.

Personally, I feel that one of our most lacking areas is public education. We have so much knowledge, yet we consistently fail to share it.

At our most recent branch meeting, I was asking about how many members we have, and the numbers I got back varied from 300 to 700. I also happen to know that the number who voted in our most recent NEC elections was a mere 57 people for one of the posts. I am strongly of the belief that this is due to disaffection in our own ranks from to our perceived (and, in some cases actual) lack of action. We must do; the future is not opposed, the future is built.

Friday, 21 February 2014

The Security of the Proposed care.data Scheme

Overview

Ceri the Duck has a blog post titled "Care.Data – why I am happy for my medical records to be shared for research". In this post, they make the argument that under the care.data scheme, data protection would be improved for patients.

In this post, I will tackle this misconception, which revolves around two core arguments; firstly that there will be limited access to identifiable information, and that the care.data scheme will provide a better security framework to work within.


No Identifiable Information

"care.data will only provide access to ‘potentially identifiable’ information"
If we look at the HSCIC price list, we see that they provide an extract of data "containing personal confidential data".

Even if this data has names, and other directly identifiable data stripped out, we know that anonymised data can be de-anonymised almost trivially (Further reading from Light Blue Touchpaper) in the vast majority of cases.

A Better Security Framework

"I am much happier with the level of data security care.data will provide than with the current ad-hoc arrangements. They will be consistent, with good oversight, the information disclosed will only be what is needed instead of having to comb through a patient’s full record, ..."
This is simply not true. The care.data scheme will be taking medical records from a setting where they are hard to even sort through with legitimate access (as pointed out by the author themselves) to a situation where the records will be much more easily accessible to many thousands of people, none of whom will have undergone any serious training into information security, data protection laws, or the ethical issues surrounding the use and dispersal of this data. It is also highly unlikely that they will have had so much as a criminal record background check.

Granted, the current situation is very poor, but it does not allow for a large scale abuse of the system. Sure, I could target Bob Smith, break into his doctor's surgery and steal his record. With the new system, I could target large swathes of the population by simply bribing the right people. The information gained could be used by all manner of people, be it for surreptitious back ground checks on potential dates, to discrediting a political candidate, and everything in between.

And that's just bribing-based attacks. Think of the rubber hose cryptanalysis opportunities, the social engineering based attacks, physical security attacks, phishing and spear phishing attacks, attacks on end-point security (this is your classical "hacking into the computer" attack), and so on.

Data transfer must occur at some point, and with great data transfer comes great opportunity. How do you conduct the transfer of data, and how do you setup the transfer so that you're definitely sending the data to the person(s) you think you are? These are generally considered to be solved problems in the cryptographic community for the most part (key distribution is hard, for instance), but in practice, it is anything but.

In short, this centralisation effectively paints a massive target on the back of the country's medical records, and gives access to institutions with some of the worst information security going in many places.

The adversaries in this situation will not be small time. Computer crime is big business, and aside from nation states, organised criminals are one of the hardest adversaries to defend against. They are extremely well organised and well funded, with extensive experience attacking high value targets. Once they've breached the system, they have the contacts in to sell the data on and actually turn a profit on this kind of attack.

Conclusion


Security is far harder than most people think, and most people don't know how much they don't know. HSCIC cannot be in the position of implementing this system and not be aware of the serious and numberous risks outlined.

The care.data scheme will suffer a breach, and given how centralised the system is likely to be, I expect the breach to be a large and very serious breach of previously unheard proportions.

New care.data leaflet

Overview

There is a new scheme called "care.data", which will cause previously confidential medical records to be sent from a person's GP surgery to a central database.

Access to this database will then be sold for a fee.

The process of informing the public that their confidential medical records will become a commodity for those who meet the criteria for accessing the records has been spotty at best, and down right misleading at worst.

Leaflet


We (my partner and I at least) will be hitting the streets to distribute an A5 leaflet to the general populace informing them that this is taking place.


This leaflet is licensed CC-BY-4.0, meaning that you are feel to remix, reuse and redistribute as you please, as long as you attribute my partner and I.

You can access the full files: SVG, PNG and PDF. If you feel that this is an important issue, the leaflet is quite amenable to cheap risograph printing, so get yourself a print run done, and get out there on the streets!

Sunday, 2 February 2014

The Bet

Overview

 I have entered into a bet with my other half.

The bet is simple. That I can get her to the point where she is capable of running a half marathon in 6 months. She can't run a 5K yet.

About Me

I am what I would think of as "normal". But my parents are unsure of where I get my drive to exercise, my other half seems to see my as some sort of superman, often using words like "inspirational" and "incredible".

Truth be told, I am not a superman, I'm just a guy who happens to do a little bit of running on the side. I've run in a couple of 5Ks, a 10K, a couple of super sprint triathlons (if you're curious, it wasn't quite a super sprint, it was a 400m swim, a 20km ride and a 5km run) and a sprint triathlon (Usual distances).

I do not think of my self as particularly sporty, I'm between 56kg and 58kg depending on when I last ate, and I stand about 1.8m tall.

My History

When I was quite young, I swam. I swam a lot, I got my "honors" badge for swimming before I was in year 7, meaning I swam something like 1km in 40 mins, and in that same session, I extended it to 1.5km to get my 1.5km badge. Nothing like two birds with one stone. I eventually went on to swim for my city in my age group for a year or two, but eventually stopped when I went into senior school. At the time I thought I would be getting lots of homework, but looking back, I really don't know why I stopped -- I knew I wasn't the sort of person to actually do my homework.

Outside of that, I didn't really do sport, and I certainly didn't run. Even doing the breaststroke gave me issues with my knees. At school, we were made to do rugby and cross country running, where my knees were a serious issue. A bad tackle could cause my knee to dislocate, meaning that I'd be stuck squelching in the muddy ground clutching my leg until taken somewhere so that I could fix my knee. It was unpleasant. Cross country running was better, but it caused my knees to hurt a lot, so I ran very slowly. Slower than the over weight guys with asthma.

When I made it to sixth form, we were allowed to choose a sport. I chose badminton, which I knew I'd excel at from doing PE in previous years. I did do very well, often taking on the teacher and actually managing to put up a not-unreasonable fight, but always eventually losing. It was rare for me to lose to my peers.

After that, I went to university, I mostly stopped doing anything sport related in my first year, but my my second year, I'd been invited to come along to a pole exercise session, which I wasn't entirely awful at! I ended up performing at the university "woodstock", and teaching for a couple of years, so I didn't do too badly out of it.

Once I left university, I kept teaching at pole, but also took up running and entered a few triathlons. They were good fun, to say the least! After a couple of triathlons, I joined up with the Jitsu club and promptly put my back out.

After a year and a bit recovery, I was back on the mat, and back to running, but with not nearly as much dedication as I had before. I intend to go back to pole sooner or later, but I don't know when.

The Bet

While in the kitchen with my partner the other evening, our conversation turned to exercise. My other half does not consider herself sporty at all. I made the passing comment that if she gave herself over to me, I could probably have her running a marathon with around 6 months of dedicated training.

Obviously, she didn't believe me, and after a bit of cajoling and back and forth, we entered into a bet that I could have her running a half marathon in 6 months.

I've never run a half marathon, in training or otherwise, but I know what it, theoretically, takes to get there. I like a challenge.

My partner has a large mitigating factor that we need to deal with. She has quite a serious anxiety disorder, meaning that panic attacks in public, and extreme self consciousness are two of our biggest hurdles. Without that, I'd say us just agreeing to stop would be our biggest hurdle. She can do the running, she just doesn't believe that she can.

The First Run

The first run started out more difficult, since letting her keep pace and running side-by-side is quite difficult, especially since I have much longer legs, my natural pace is a bit higher than hers, so I started to pull away, leading to me upsetting her, with her saying that I was leaving her behind,

After she was warmed up, and I kept running a pace or two behind to make sure I was matching her pace, we managed to keep the run going, by simply running for 2 lamp-posts' distance, and then walking the same amount. She even said that while walking she was feeling lazy, because we weren't running!

Unfortunately, during the walk part of our run, a vicar, cheerfully invited us to worship with his parish. I thought that it was a lovely gesture, but he wouldn't be doing so if he knew what I thought about his god. I politely declined, keeping my cool and walking straight on. We would have to run back that way. We did another run section shortly afterwards, and turned round. After a brief walk, we were about to set off again with more running, but I needed to help my other half not freak out about being "Jesus at" if I could so verb my nouns. We got past this, and ran straight past the vicar with no offers of Jesus or other religious figures being politely sent our way.

When it came for the last run, I said that she should run for as long as she could. Before I could finish my sentence, I'd caused a panic attack, what I'd wanted to say was that, between where we where and home, she should run as much of it as she comfortably could, then walk the rest.

Once we'd managed to fix the panic attack, say the right words, we got back on our way. She managed to run all but about 5 meters of the rest of the distance back home (proving to me, but not to her, I think) that she can run for longer than she thinks.

I'm looking forwards to running more with my partner.

Tools of the Trade

We both have Fitocracy for logging our exercise, and our diet is quite simple. We're on our way to being vegetarian, and we explicitly avoid foods that are very high in carbohydrates. I'm looking at you pasta and rice. I think a plant-based diet will really help us in this.

Combined with these blogs and our fitocracy, we should be able to plot our course when we look back at this.

Thursday, 26 December 2013

Searching with Bloom Filters in Haskell

Overview


This post is primarily about my experience with the bloomfilter package in Haskell.

Coding it up

Coding it up was much of a muchness, aside from a late realisation that there's no Hashable instance for Text which caused a quick change from Text to ByteString, but aside form that, it was very easy.

Even querying the index is easy, I shall post up the code when I am back at home, since I'm with my parents over the Christmas period.

Overall, it's roughly 38 lines, but around 7 lines of that are simple module imports.

Testing

To test it, I grabbed roughly 2.5MB of data from Project Gutenberg, and simply started querying the index. I haven't done any "proper" performance testing so I've just been feeling it out with no timers.

I didn't bother forcing anything, so the first query can take a little while as the index is built, but it's nothing prohibitive. After that, every search feels instantaneous.

After building with -rtsopts, I had it generate a heap profile, which showed that the peak memory allocated with my test set was roughly 8MB, but that fell back down quite rapidly.

Improvements

I suspect that a carefully used deepseq, and ByteString.Lazy could really improve the memory foot print of the system, and remove the initial "hang" when doing the first search. I shall have to investigate this at a later date.

Extensions

If I could just get access to the underlying UArray Int Hash in a way that I could manipulate, I could make searching more probabalistic, but also much faster.

Similarly, I could extend the usage of a bloom filter into a similarity metric, but unfortunately, I'm not entirely sure how to use the underlying representation at this time.

I basically need access to the cardinality of the underlying representation, and the ability to union and intersect the underlying bit arrays. Once I have these (or worked out how to do it on the underlying representation) performing similarity matching becomes very simple.

Conclusion

Haskell's bloom filter library seems to be very fast whilst maintaining an easy to use facade.

However this is not flexible enough to really play with bloom filters in all the ways they can be very easily. Perhaps basing the underlying representation on the bitset package, or offering a "serialisable" version which can be easily manipulated would be all it takes.

Wednesday, 25 December 2013

Client-Side Similarity Matching

Overview

This post is entirely inspired by a post I had recently seen on HackerNews, specifically, "Writing a full-text search engine using Bloom filters", by Stavros Korokithakis. I then went off and did a bit of Googling, and came up with a paper by Jain, Dahlin and Tewari.

The Full-Text Search

Korokithakis' method is actually reasonably ingenious, take your documents, tokenise them, and enter them into a bloom filter. This gives a highly compact representation of the set of tokens in your document which is extremely efficient to query. Take the bloom filter, and add it to a map of DocumentId ->  BloomFilter.

To search, one tokenises the query,  iterates the values (BloomFilters) of the map querying if the query tokens appear in the bloom filter. If they do, the document is considered a hit in the search.

It doesn't take much imagination to extend the tokenising to stop-word filtering, stemming, n-grams, and all the other lovely methods used by more traditional full-text search engines to make the search more usable by humans, and to extend the "relevance" beyond hit/no hit (e.g. more hits = better is one improvement to the described scheme that I can think of).

Similarity Searching

This is where Jain, Dahlin and Tewari's paper comes in handy. They used bloom filters for detecting similar web pages when crawling a site.

Their method was to take index the web pages using a bloom filter, but using a different tokenising method, and then to simply count the proportion of matching bits in the resulting bloom filters. They then have some nice analysis to show how "similar" two documents can be considered to be if their bloom filters have a high proportion of set bits.

Onto the Client

This is very simple, you parcel up your index, but instead of searching it, you simply take the current document's already computed bloom filter (no need to send over the pesky tokenisation code!) and do a bitwise AND on the other elements of the index. The ones with the highest cardinality are the ones which are most similar to your selected article.

This is also the kind of calculation that could be done once at at index generation time.

Monday, 25 November 2013

My Experience with Temporary Blindness

How it Happened


For those who don't know me personally, I wear some rather strong glasses. I have a natural focal length of something like 13cm (That is to say, anything beyond 13 centimeters is just a blur) without my glasses. In recent years, I've switched to contact lenses, continuous wear ones to be specific. These allow me to lead a life which is reasonably free of glasses, to engage in certain sports that were just prohibitively difficult before.

Fast forwards to 12th of November, and I take my lenses out because I had some pain in my eye. The next morning, at around 0530, I work up with an unexpectedly painful eye. It had swelled shut. So bad was the pain that my other half took me to the accident and emergency department to get it looked at.

I was told that I had a serious infection of the cornea, or an ulcer, and was at risk of losing the eye. I was given muscle relaxant eye drops, antibiotic eye drops, local anesthetic at the hospital and discharged. The basic effects of the (what was confirmed at a later appointment) infection was that any movement of the eye was quite painful, including the movement of my iris (hence the muscle relaxant). Any movement, as triggered by my "good" eye will also cause me a lot of pain, so using either eye is right out.

So I was essentially stuck in a position where my least painful route for the day was to wear a blindfold, take my medications at the allocated times and take normal, over-the-counter painkillers. This was my state for two days. I could not see at all.

I took the time off work, predictably, since I ride my bike into work, this could've ended badly!

My Life, Blind

I work at the local university as a researcher, this actually sounds far more impressive than it is -- I normally just end up writing Java on a Linux box. At home, I rely on Linux Mint 14.

Overall,  the experience was patchy. I asked my other half to do a quick google to find out what the screen reading software for Linux Mint 14 was, and the result was something called "Orca". I used my limited knowledge of keyboard shortcuts to get to a terminal, run aptitude install orca, logout and in, and start Orca.

The terminal worked mostly well, except for the fact that I use zsh, which lead it to reading out a lot of superflourous stuff that you tend just filter out when reading it, but only read when you need. This is not the case with a screen reader -- everything (including my verbose prompt) is read at you all the time, after every command. At first it is very overwhelming.

I tried to use Miro to get some podcasts to pass the time,. and while using aptitude to install it was easy, using Miro was anything but. I ended up switching to my usual media player which was Banshee. Banshee required a restart to get it to pick up the fact that I was using a screen reader. It however worked ok after this. It was easy enough to navigate, however, it was very difficult to select a specific podcast episode (I could select things like, favourite songs, podcasts, all items, new items, various podcast "channels" that I'd subscribed to.

So far I've been getting through the Haskell Cast episodes  that I have wanted to listen to for a while. I plan to listen to a few more podcasts in my suddenly spare time.

Moving around my flat has actually been perfectly ok, it just took me a little longer. I mostly walked around my flat with my hands stuck out or a hand on  a wall. I could still do everything in the bathroom, but I was disbarred from my own kitchen. I didn't have to leave my house yet, except for going to the hospital, for which I had a guide.

The evening after I got back from hospital, I had a friend round to visit. luckily, she knew about my predicament and brought round a board game. It would seem counter-intuitive that a board game would be any good, but she brought round Tokyo Monster Something=or-other. It actually worked really well with my other half telling me what I had rolled, and who was currently in Tokyo and other information. I can't imagine any of my boardgames being even slightly accessible, and we definitely couldn't have played Xbox or watched something on Netflix.  But that's somewhat ok, since  haven't exactly tailored my living room for that.

Even Pidgin worked really well -- I use Google Talk at work, DukGo's XMPP server for comms with my friends, so I was well served by Pidgin and it worked flawlessly with the screen reader. In terms of accessibility, it's ease of use excelled even that of the preferences panels of Orca!

Isolated

This did not stop me getting quite isolated. I couldn't pass the time on my Xbox, or watching TV, since I don't think Netflix offers audio description on it's offerings.

I couldn't use my kindle, I've noticed now that there aren't even headphones on the thing, so I couldn't read any of my books that I've bought.

Worse than this, however, was the things which did not work with a screen reader. The main culprit on Linux Mint 14 was FireFox (Fx).

I could alt-tab through windows, and have the title of each item read to me perfectly, and when I switched to Fx, I was greeted with the most deafening silence of the whole experience. I tried as much as I could remember of getting to the preferences panel blind, but I think I just probably set some obscure settings.

I ended up using Links in the terminal. If you think this is painful when sighted, try it with Orca! Some websites "worked", but it did mean sitting through a huge list of navigation (or worse, adverts) on every page load. A few pages offered a "Skip to Content" link (I think /. was one of them), which was met with absolute glee.

Sites like reddit were quite unusable, mostly for the list of subreddits over the top, and HackerNews worked; I could read headlines after waiting through an amount of navigation that was almost (but not quite) unbearable. The links that I followed were a mixed bag, and actually working Links can be a bit hit and miss, especially when you're disorientated, and you're not sure if you've managed to exit the terminal window all together or something else entirely.

When there's a lot of navigation on a page, it becomes hard to stop yourself just zoning our whilst it is read at you, and then you realise that you're in the middle of the content, and you need to backup some how.

For reference, I also tried to install Chromium, and was met with substantially the same problems -- silence.

Even making notes for this blog were not a walk in the park. I ended up using GEdit. Actually writing the document was reasonably easy. It was more difficult to save it. After Hitting Ctrl-S, I was a little overwhelmed and confused by the amount of things it said to me, but I ended up navigating via tab (Also worth noting "push button" is very difficult to discern when said by a screen reader).

Conclusion


I would say that prior to this issue, I was still sensitive to accessibility issues, having a sight problem means that I often end up zooming pages when my eyes are tied, or I have to wear my glasses at work. But nothing could've prepared me for how incredibly disorientating it is to be blind at a modern computer.

Overall, computers seem somewhat usable if you're not used to not having sight, and are dumped into having to learn how to use a screen reader, but if you're unfortunate, you'll be stuck cut-off from the web. This should not be possible with a distribution  like Linux Mint and a modern web browser like Firefox. I'm glad that I was able to get around, and that with some help I was able to get my podcasts playing, and write notes about my experiences,but it shows that it is overly difficult to get things to just work when you really need it.