Experiments in Alchemy: The Savile Files Mega-Graph (alpha release)
Over the last few weeks we have been experimenting with Alchemy in our laboratories as well as other forms of techno-wizardry.
AlchemyAPI is a Natural Language Processing service that automagically picks out entities from a block of text and semantically tags them. It can do this with people, places, companies, organisations, job titles and a variety of other categories.
It also keeps a word-count of each entity which we were able to use to size each node in the graph to represent the importance of the protagonists. An experimental feature attempts to work out the ‘sentiment’ of the entity as it appears in the context of the text.
The graph is based on the first 3380 pages of the ‘Savile OUTED as a PAEDO‘ mega-thread on the David Icke Forums. For those that have never visited the thread it is a bit like an online Paedopedia Brittanica that has been investigating the Savile case and the wider problem of child abuse since October 2012.
Like the planet Saturn, the node ‘Jimmy Savile’ unsurprisingly dominates the graph as it sucks in thousands of ‘satellites’ into its orbit. This includes the good, the bad and the royal alike. His node would have been many times bigger based on its word-count but it would take up half the screen.
Over the next few decades, we will be looking to refine the graph and add the all-important connections between nodes. A sample has been added to the ‘Jimmy Savile’ node to show you what the connections will look like.
Of course, by crowd-sourcing this work amongst many people it need not take this long to complete. If enough interest is shown we will provide a simple web interface to allow this to happen.
We’ve seen what ‘white horse’ has been doing with ‘The Brain’ and would be keen to share our data and processes and see if we can combine resources.
The mega-thread has been a force of nature that has directly influenced the police, media and government but it is unstructured data, hard to navigate due to its size and lack of indexing or GUI, infiltrated by disinfo agents, and is what it is – a pseudo-anonymous internet forum often-times based on second or third hand information or more usually articles found on the web and usually from the mainstream media. In other words, the information flow isn’t first class which is pretty much standard for ‘surface web’ data, especially a forum discussing subject matter of this nature.
Despite this, the wisdom of the crowd and the research skills and integrity of the regular posters cannot be discounted and the graph appears to show a fairly comprehensive and accurate picture of everything surrounding the Savile Exposure.
Once the dataset was processed through AlchemyAPI 25333 nodes in total were discovered including over 13000 persons, verified and semantically classified. Through Alchemy, we could also deduce the sentiment of each node. This means how it was perceived in the context of the surrounding text. This is an experimental technology and threw up anomalies although in general it proved fairly accurate.
Clicking or touching a node brings up a profile window on the left hand side. The nodes are not currently situated in any particular order.
There is still a lot of noise and duplication that has to be cleaned up and normalised. For instance, Jimmy Savile has over 20 monikers that we will combine as one entity. We have been reluctant to remove too much data at this stage however until further analysis and verification can be completed.
A drawback with AlchemyAPI, that is more to do with the ambiguity of place-names is that it mistook many UK places to be in the U.S. which is understandable considering that America borrowed many British place-names. We are currently rectifying this manually before we geocode the locations and graph them accordingly so that a map of the world will appear centred on the UK.
You should allow around 30 seconds for the graph to load. The interface is fairly intuitive. A navigation tip is to use the ‘map-view’ in the bottom right hand corner along with the magnifying glass and zoom control to bring into view the small nodes.
Hopefully it won’t crash if all 5 of us try to load it at the same time. If it’s unresponsive then try again later.
Let us know what you think and ways it can be improved and expanded.
Link to graph: http://5ocietyx.com/test/gexf-js-master/index.html