This is the first of many project recaps that we’re writing for our future students and corporate partners to give insight into the kind of projects SIT Academy's Data Science students get to work on during their Capstone Project.
SIT Academy’s batch #7 (May 13 2019 - July 31 2019) of Data Science students worked on five projects that were provided by our industrial partners, such as Swiss International Airlines, Qard and PriceHubble. All projects involved Machine Learning and two included Deep Learning. They covered a broader space of Data Science applications. Here is a list and some details.
As a FinTech startup, Qard analyzes e-commerce businesses applying for loans and uses a data-driven approach to identify those with a high risk of default on their loan payments. Qard would like to extend this system to using non-financial data. For the purpose of this project, SIT Academy's students worked on extracting e-commerce-specific non-financial data from around 400GB of structured/unstructured data that has been collected by Qard over the years. SIT Academy students reached an accuracy of around 70% on identifying default cases using non-financial data. Development of such a system would essentially help all loan providers because they would not need to ask a borrower specific details about their finances.
Personal student project
This was an independent project brought by one of the students with PhDs in similar fields. Image-based Genetic Perturbation screens are regularly used in research labs to identify markers of cancer causing genes. Such screens generate petabytes of data (millions of images) and require automatic systems to analyze these images. The two students wanted to test if they could use Deep Learning, primarily Convolutional Neural Networks and Variational Auto-Encoders to automatically classify images into their category of interests. Since no labeled data was present, students had to use active learning as a way to sequentially create their train-test data. The supervised approach produced an accuracy of >90%. A second approach using unsupervised learning based on auto-encoders needs further exploration, but was already able to create real looking computer generated images...
This project involved the application of Active Learning with Convolutional Neural Networks to automatically classify property images into different price categories. For the purpose of this project, multiple pre-trained networks (ex: ResNet and VGG16) were used as a starting point to further train them with our data. Using pre-trained networks is a standard practice in image analytics using Deep Learning. The student who worked on the challenge could achieve an accuracy of ~93% on this data.
for SIT Academy
As an EdTech startup, SIT Academy often looks for ways to help our students develop their learning needs using data. The central aim of this project was to identify the skill set required for the technology-related jobs in Switzerland, match it with the job seeker’s own skills and background (as extracted from LinkedIn profiles), and finally offer the latter suitable positions or training programs. To this end, the students employed NLP techniques to find out the semantic similarities between job ads and candidate skills, something which most job recommendation services lack. SIT Academy is now working on developing this project as an online tool to help not only our students but also the general swiss public.