At SIT Academy we care a lot about teaching skills that are relevant in practice and want our students to apply their newly acquired knowledge in real world problems. For this reason, every SIT Academy Data Science student works for 3.5 weeks on a capstone project which is usually provided from companies in our network.
Up to our knowledge this is unique in the world of bootcamps and has proved to be very rewarding - both for our students as well as for the companies that provide projects.
SIT Academy's Data Science batch #11 (May 11, 2020 - July 31, 2020) worked on five projects that were provided by our industrial partners, such as Nispera, Intelligencia.ai, Sentifi and Advertima. Read more about each individual project below.
The development of a drug can cost up to $2 billion and take 15 years with no guarantee of market entry. For this reason, it is very important for pharmaceutical companies to understand which disease to treat and test with the drug. Marie Bocher, Mariana Zorkina and Seamus Dines developed a tool that aims to assist pharmaceutical companies in making this decision by summarizing the necessary medical information and forecasting the clinical trial trends. For a more in-depth explanation and to use the application itself, click here.
Alternative data in finance is any type of data that can be used to provide insight into investments outside of the traditional market data and financial indicators – data such as social media posts, web activity or even satellite images. Sentifi is an alternative-data provider company that specializes in turning social media activity into numerical scores, which represent how the market views any individual stock or sector. Valentine Herzl and Amine Chbani had the challenge to derive insights from the data provided by Sentifi, and to try to use these insights to build a trading algorithm that yields better returns than the S&P 500 index. By focusing on longer-term returns and combining individual stock scores with sector scores, they managed to build two models – a statistical model and a regression model – which both beat the S&P 500 returns for the years 2017-2020. While the S&P 500 went in this time span up by about 40%, they were able to create a model that generated - based on price and sentiment data - a return of 108%.
Advertima is a computer vision and machine learning company that focuses on real-time visual interpretation of human behaviour in the physical world. It offers stores monitors equipped with sensors that enable the displaying of ads targeted to consumers and drive increased sales.
Sarah Kurmulis, Olena Levchun and Matthieu Bornet were tasked with understanding, cleaning and engineering the data from multiple stores and building models using Machine Learning and Deep Learning to predict whether customers will be present in the stores, their physical position, and their attention to the ad monitor over the next few seconds. Their models reached high levels of accuracy for predicting presence (86% mean accuracy at 5 seconds) and position in stores (8cm mean position accuracy for a 1s forecast, 112cm for a 5s forecast), and valuable insights into zones of attention to the screen. These results will enable the company to further improve its targeted ad system.
This project was a collaboration with Paul Windisch, a physician at Kantonsspital Winterthur and former SIT Academy Data Science student. In order to recognise brain tumors in MR images (magnetic resonance imaging of the brain) more easily, Cornelia Schmitz and Norbert Bräker built a classification network to group the images based on two properties: the perspective and the MR sequence used to generate each image. Two approaches were used, one implementing a newer architecture called Siamese Networks and one applying the more standard method of using a pre-trained network as base. The students achieved a classification accuracy of ~94%-98% for perspective and sequence. The work done in this project will help Paul Windisch for his own research on automatic detection of brain tumours, a project that he started together with Pascal Weber while being a SIT Academy student and which is now funded by SNSF (Swiss National Science Foundation) and Innosuisse (Schweizerische Agentur für Innovationsförderung).
Nispera is a Zurich based company providing data intelligence services for renewable energy plants worldwide. Dave Lonsdale, Daniel Gisler and Konstantinos Kirtsonis were challenged to see if they could identify the energy losses on a large photovoltaic (PV) plant due to soiling – the slow accumulation of dirt on the panels – using standard operational data from the plant. The plants examined were in the Atacama Desert in Chile where a relatively low level of soiling is incurred (~3% per annum). They deployed several statistical and ML clustering techniques to see if there is sufficient signal in the data. The toolkit that they developed provides a pipeline that cleans up a very noisy environment and can be applied to datasets from other plants. They applied the pipeline to two plants and made several important observations around operational performance: PV saturation in summer negated soiling effects, there was a non-linear performance improvement in winter that requires further investigation and on the second plant, a notable performance improvement (perhaps a rain event or cleaning) which is being followed up with the plant owner. Although it was not possible to detect the low level of soiling on these plants, the Nispera is keen to continue the project as there is no known solution in the market.
We thank all the companies that have been involved in the projects for their support. It gives our students a unique opportunity to grow their knowledge and skills.