One of the biggest highlights of our 12-week Data Science Immersive Program
is, of course, the Capstone project phase! During this time, our brilliant students have a unique opportunity to solve practical Data Science problems that companies and research institutes throughout Switzerland have provided. This is a fundamental part of our curriculum because at SIT Academy, we care deeply about teaching students the specific skills that are currently demanded in the industry and are essential for solving meaningful problems. Over the last 3.5 weeks, our students experienced the complete Data Science process, from defining their client’s business problem, exploring the data to apply suitable Machine Learning techniques, to finally delivering a functional prototype to the company. The culmination of all the hard work that goes into these Capstone projects is a public presentation in front of family members, friends, companies, students, and everyone who wants to attend. You can simply register for the next presentations via the SIT Academy Meetup page.
SIT Academy's Data Science batch #13 (November 9, 2020 - February 12, 2021) worked on six projects provided by our industrial partners, such as Swiss Data Alliance, Pipra, ecoinvent, and Syngenta. Read more about the individual projects below.
Data Innovation Alliance - Web-scraping and Visualization for DataInnovation
Student: Tiffany Carruthers
The Data Innovation Allianceis an organization that fosters innovation and collaboration within the data sector. They do so by hosting workshops and conferences and connecting organizations and universities with similar interests to develop new data-centric products.
In this project, Tiffany designed interactive visualizations for the Alliance’s website. She gathered information by web scraping with Python, applied natural language processing to classify the information, and visualized her findings using D3.js.
1) This is a force-directed graph; it shows the relationships between all the projects and the involved organizations.
2) This visualization is a circle-packing diagram, which shows the 13 data-related expert groups as large bubbles. Each of these bubbles contains several smaller bubbles representing various projects associated with that expert group. An NLP model was developed to predict the assignment of each project to a particular expert group.
PIPRA - Precision Medicine: Predicting postoperative delirium (pod) with ML Models
More than 40% of patients aged over 60 are affected by the cognitive disorder Postoperative delirium (POD). Patients suffering from this cognitive disorder may experience disorientation, difficulty in speech, and even memory loss, which often result in longer hospital stays, higher hospital costs and lower quality of life.
To help doctors be more informed when making medical decisions, Elena, Daniel, and Zana developed a Machine Learning-based solution to assess the risk of a patient developing a POD prior to a patient’s surgery. They used a Decision-Tree Model combined with the AI Explainability model, SHAP, which identifies the specific features of the patient’s medical data that have contributed to the final prediction. With this knowledge, doctors have the opportunity to choose an alternative preventive measure for a patient before going ahead with surgery.
ecoinvent - Automated Data Visualization
Organizations are increasingly looking to understand the end-to-end impact that a product has on the environment. For this purpose, ecoinvent
provides a comprehensive life cycle assessment database to thousands of companies such as Toyota, Lego, and Procter & Gamble as well as government organizations and universities. Though the data sets are comprehensible on their own, the underlying calculations and methods used to estimate environmental impacts can be complex and overwhelming to understand for most people.
In this project, Angela Niederberger and Sarah Dutschke helped ecoinvent make their holistic reports more user friendly by adding data visualizations representing the impacts and network structure of production chains. The main challenges were to make those visualizations fit for about 60’000 diverse data sets in a uniform way, emphasizing the most relevant information. The Python script produced in this project will be integrated into ecoinvent’s automatic PDF generator, thereby reaching thousands of clients.
SIT Academy - Marketing campaign recommendations using SIT Academy’s online traffic data
Being ranked in the top position of the most commonly used search engines is an influential online marketing strategy for most companies. In this project for SIT Academy, Pawinee systematically identified keywords that align with search engine queries made by potential customers. She began by making a comprehensive analysis of keywords relevant for SIT Academy’s course offerings (English & German) to identify the best keywords, then used Search Engine Optimization tools (SEMrush) to identify new keywords from competitors. The impact of Pawinee’s work has been to identify keyword groups that have a large potential to lead to high click rates while keeping costs moderate, resulting in valuable lead generation for the business.