Man and machine learning

Man and machine learning: Data projects and the opportunities for developers

As businesses increasingly move their operations to the cloud, they’re recognizing the potential to harness the almost limitless compute power available and tap into artificial intelligence and machine learning technologies to deliver insights and value to the business that were previously beyond their reach.

Businesses have never been in a better position to create value from the vast amounts of data they hold. Developers with the skills and knowledge to unlock this value are therefore in a prime position. But how should businesses approach such projects? Here are four tips for development teams and data scientists who want to help firms bring this value to market.

1. Agree on the use case
It’s imperative to be clear on the objectives for any AI project upfront. Use cases for AI fall into three main areas. Firstly, there are projects designed to improve customer engagement and serve up personalized recommendations to customers. Secondly, business analysis projects optimize processes and support decision-making. And thirdly, operational AI projects – using AI to digitize entire processes to deliver increased efficiency, reduced costs, and other savings.

Being clear about the scope of the project and how success will be measured is paramount. Targets could include a metric to reduce processing failures, to reduce the timeframe for a specific process, or to increase revenues by a certain percentage.

I’d recommend starting small, perhaps with one team in one geography. Proving the use case works in a particular scenario can allow initial success to be quickly demonstrated. The scope and sale of the project can then be gradually expanded – with the business value measured at every stage. This approach also allows for ‘fast failure’ so that if something isn’t working, resources can be re-directed and the team can start again.

2. Get your Agile game on
If data projects are to succeed once use cases are established, the right teams must be assembled. In my experience, Agile Scrum teams are the most effective. Take a nine-person team as an example. The breakdown of core disciplines should be as follows:

Firstly, a business analyst (BA) must take charge of establishing the use case that will be achieved with the project, understanding the ideation around it and feeding this back to the rest of the team. Through this process, clear objectives can be set, particularly relating to key results for the client, but also what is achievable with sprints on the development side.

Next, and perhaps the most important, is the data scientist. In the scenario set out above, four would be the optimum number – and this is by no means an overrepresentation. As with any data project, 70% to 80% of the work to be done involves cleaning and arranging the data such that it can be used to bring about the use case agreed at the start. Furthermore, unlike regular software products that are built once and then deployed, data projects demand continuous deployment due to the dynamic nature of data.

Machine learning engineers make up two members of the Scrum team and will be responsible for building the data pipeline, and lastly, two QA members, with specific knowledge of the use case agreed upon at the start, should complete the team.

3. Use the right data
One of the major concerns of any data project is data sensitivity. AI and machine learning algorithms need significant amounts of data to produce good results; the more data, the better the results. But there are of course limitations on the types of data that can legitimately be used.

Regulations and privacy concerns are the biggest issues to contend with. Where a data set contains private information that can provide significant value for machine learning, it’s essential to approach this in the right way. This could include anonymizing sensitive data before running the analysis.

Given that data is ever changing, and that data projects follow a process of continuous delivery, the best way to validate a use case is to start small. Once the scope of a data project is validated it can then be rolled out more widely, constantly expanding but always scaled to achieve the key objectives set from the start.

Scaling up can change the context of the data, as will dealing with different customers. It might be possible to build a very accurate model for one customer, but the same model may perform poorly for another. So, the model must be changed and run accordingly, then maintained once deployed. This is one of the key differences of data projects; 60% of the work follows deployment, largely due to maintenance requirements.

This is an issue that often leads to timeframes expanding beyond initial targets. In a regular development project, you can predict with some degree of accuracy how long it will take to deliver the end product, as there is a clear understanding of the software. When it comes to data projects, uncertainty should be expected, as the more data that is gathered, the higher the risk that the overall context will change.

Transparency is key. Being open about the nature of data projects from the outset will help to maintain a good relationship with the customer. Bringing them into the process early and piloting the solution as outlined above, will reduce the risk of surprises down the line. As long as there is a clear commitment to solving the problem you agreed to solve, friction can be avoided.

4. Take to the cloud
Data-minded developers are in an era that is entirely theirs to own. Open-source tools such as TensorFlow, and cloud platforms such as Microsoft Azure, Google, AWS and Alibaba, are providing strong support for AI and machine learning projects. In my experience, developers working with DevOps tools and techniques are the most adept at creating value from data propositions, as they are most familiar with open-source tools and increased automation, as well as the cloud platforms that marry the two.

These platforms offer major advantages when it comes to data projects. To train machine learning, massive infrastructure is required. A graphics processing unit that enables deep learning, for example, can be very expensive to buy and operate, whereas cloud platforms can provide the same capabilities for a fraction of the price.

So, the time is right for developers and data scientists with the knowledge and skills demanded by data projects to bring new value to businesses. The pressure on organizations to innovate at pace has never been greater, and data – when used effectively – can deliver this like never before.

This blog previously appeared on SDTimes.