One in 50 American children have autism, according to the latest figures released by the Centers for Disease Control and Prevention in March. One of the winners of the YarcData Graph Analytics Challenge, announced in April, can make a difference in better understanding the causes of the disease.
Taking second place in the competition, the work of Adam Lugowski, Dr. John Gilbert, and Kevin Dewesse, of the University of California at Santa Barbara, leveraged a dataset created for the Mayo Clinic Smackdown project, that has the same structure and property types – and scale – as the medical organization’s actual Big Data sets around autism, but which uses publicly available data in place of the real thing. The team can’t use the real data because it includes private information about patients, diagnosis, prescriptions, and the like.
But the actual data deployed for the project doesn’t matter, says Lugowski . “The goal is to find relationships we have never thought of before, and this way it doesn’t prejudice the algorithm,” he says. Using YarcData’s uRIKA graph analytics appliance, the algorithm queries the Smackdown dataset – which in its smallest version has almost 40 million RDF triples and in its largest is about 100 times bigger, mirroring the size of all the Mayo Clinic’s actual autism data – to discover commonalities among the data, mimicking how the real data sets could be queried in search of common precursors among clusters of patients with the diagnosis.