Whitepapers
How OpenAI’s capabilities in NLP can differentiate Microsoft’s commercial offerings
1) Managing unstructured data
The exponential growth of data in the last decade has created significant challenges for enterprise technology and data teams, placing increasing pressure on internal systems that are often not architected to meet this level of scale and speed.
While this growth creates opportunities for innovation, the underlying challenges continue to compound. For example, a study from Statista outlines that the volume of data created globally has surged from 2 zettabytes in 2010 to 47 zettabytes in 2020. This growth is only going to accelerate and by 2035, the amount of data is predicted to reach an astronomical 2,142 zettabytes (figure 1). To put this in perspective, a single zettabyte is equivalent to 1 billion terabytes.

It is at this intersection of exponential growth and challenge of capacity where Microsoft and OpenAI have a strategic opportunity to improve how data is managed. MIT Sloan reports that 80 to 90 percent of data is unstructured, often existing as text buried in email threads, chat messages, documents, support notes, and so on. Given that so much of unstructured text already exists in Microsoft products, the possibilities to train GPT-3 based models on these data sets and integrate these capabilities directly into Microsoft applications could provide organizations new ways to extract, summarize, and analyze this information at scale.
In the spirit of this article’s topic, I asked ChatGPT how it thinks OpenAI’s LLMs could impact the management of unstructured data. It responded with the following examples:
Integrating OpenAI’s language models into business applications and systems can improve the management of unstructured data in several ways:
- Text classification: OpenAI’s language models can be trained to classify unstructured text data, such as emails, customer feedback, and social media posts, into relevant categories. This can help businesses better understand and respond to customer needs and concerns.
- Text summarization: OpenAI’s language models can be used to automatically summarize large amounts of unstructured text data, making it easier for businesses to quickly understand key information and identify patterns or trends.
- Named Entities Recognition: OpenAI’s language models can be used to extract useful information such as names, locations, and organizations from unstructured text data, and can be used for various use cases like customer segmentation, fraud detection, and more.
- Sentiment Analysis: OpenAI’s language models can be used to analyze the sentiment of unstructured text data, such as customer feedback or social media posts, which can help businesses understand how customers feel about their products or services.
Beyond these examples there are numerous other applications that could deliver value. This partnership presents an immense opportunity for Microsoft and OpenAI to improve the management of unstructured data, bringing in a new age where natural language processing (NLP) solutions are leveraged to help organizations see the exponential growth of data as a catalyst for innovation instead of as an anchor on operations.
2) Improving Developer Productivity
Looking back four years ago, Microsoft made a strategic acquisition with the purchase of GitHub for $7.5 billion. At the time many industry analysts were skeptical of the deal’s price tag, especially given that GitHub was acquired for nearly 30x its annual recurring revenue. However, as pointed out by Paul Weinstein in the Harvard Business Review, Microsoft did not pay $7.5 billion for GitHub’s financial value, but instead for its loyal following of developers who use it on a daily basis. Further to this notion, Satya Nadella emphasized the importance of the GitHub community during the acquisition announcement highlighting that:
More that 28 million developers collaborate on the platform and its home to more than 85 million code repositories used by people in nearly every country. From students to hobbyists to startups to large organizations, GitHub is the destination for developers to learn, share, and work together to create software.
Fast forward to 2023, and the deepening of Microsoft and OpenAI’s partnership, the GitHub acquisition appears even more strategic. It highlights Microsoft’s long-term vision and commitment to investing in communities and platforms. Since OpenAI released ChatGPT, developers utilizing GitHub have already discovered ingenious ways to use the two solutions in parallel to improve things like:
- Developing code based on natural language inputs to help developers write code more quickly and accurately.
- Code reviews to improve and automate parts the QA process.
- Explaining code across teams to improve collaboration.
- Creating Technical documentation.
- Issue tracking including the automated summaries of bugs and suggesting potential fixes.
While many of these use cases are new with ChatGPT’s recent general launch, Microsoft has been using OpenAI’s LLMs to power its own products since the start of the partnership. The application of OpenAI’s LLMs has improved existing tools like GitHub CoPilot, which helps developers write code using GPT-3 models. With the deepening of the partnership between OpenAI and Microsoft, there is even greater potential to improve GitHub’s commercial offerings by further integrating OpenAI’s LLMs, like ChatGPT, into the platform directly. This integration could help create a positive feedback loop where ChatGPT powered developer tools improve as they have access to more real data and edge cases, which in turn improves developer productivity, and in turn increases the use of these tools.
3) Further Differentiating Azure’s ML capabilities
This week, Microsoft announced the general availability of the AzureOpenAI service, emphasizing the company’s ongoing focus on democratizing AI. This announcement presents a range of opportunities to improve the development and execution of AI in the enterprise. This is a compelling proposition as the successful execution of AI projects within large organizations has proven to be challenging. According to Gartner, only 53% of AI projects make it from prototype to production emphasizing the need for better solutions and methodologies.
One of the reasons the combination of OpenAI’s LLMs and Azure’s ML platform is so powerful is that it can add value to both the ML development lifecycle and provide out-of-the-box capabilities that can be adopted and refined through APIs. Looking at the ML development lifecycle, there are numerous steps that data scientists typically follow such as data preparation, building and training models, validating and deploying models, and managing and monitoring models for ongoing performance as highlighted in Figure 2.

If we explore the Data Preparation phase, OpenAI’s LLMs, like GPT-3, could be used to automate the labeling of text data and simplify the data cleaning process by identifying and correcting errors. Furthermore, GPT-3 could help organizations build and train new ML capabilities by providing access to pre-trained models that can be used for various natural language processing tasks, and by generating synthetic data when real world data is limited.
Beyond enhancing the ML development lifecycle, the partnership between OpenAI and Microsoft is also improving the ML capabilities offered through Azure Cognitive Services. This can increase the adoption of ML by providing both consumable APIs that can be used out-of-the-box, and by allowing developers to build on top of these pre-existing models. For example, OpenAI’s advanced understanding of language and context could improve the performance of speech-to-text transcription, language translation, cognitive search, entity recognition, sentiment analysis, and conversational language understanding, all of which are capabilities within the Azure platform. The continued integration of OpenAI’s LLMs, like GPT-3, into Azure Services will enable developers to build on top of these models to create industry specific solutions, trained on industry specific data. For instance, in Healthcare, organizations could refine GPT-3 to enhance the accuracy and efficiency of analyzing electronic health records (EHRs) and summarizing medical journals for research purposes . In Telecommunications, organizations could develop OpenAI powered solutions through Azure to improve self-service capabilities, summarize support notes, and analyze customer sentiment across languages on a global level.
In conclusion, the partnership between Microsoft and OpenAI represents a major step forward in the field of ML. The collaboration between these two industry leaders promises to drive significant advancements in how organizations develop and apply AI to reinvent their industries and deliver breakthrough innovations. While we are still in the early stages of this journey, the release of ChatGPT has captured the world’s imagination, demonstrating the staggering capabilities of large language models and the transformative potential of natural language processing.
Sources:
- Wall Street Journal – “Microsoft Plans to Build OpenAI Capabilities into All Products” (https://www.wsj.com/articles/microsoft-plans-to-build-openai-capabilities-into-all-products-11673947774)
- Azure – “General availability of Azure OpenAI service expands access to large advanced AI models with added enterprise benefits” (https://azure.microsoft.com/en-us/blog/general-availability-of-azure-openai-service-expands-access-to-large-advanced-ai-models-with-added-enterprise-benefits/)
- MIT Sloan Management Review – “Tapping the Power of Unstructured Data” (https://mitsloan.mit.edu/ideas-made-to-matter/tapping-power-unstructured-data)
- Statista – “Global data creation forecasts” (https://www.statista.com/chart/17727/global-data-creation-forecasts/)
- Harvard Business Review – “Why Microsoft is Willing to Pay So Much for GitHub” (https://hbr.org/2018/06/why-microsoft-is-willing-to-pay-so-much-for-github)
- Microsoft – “Microsoft and GitHub Conference Call” (https://www.microsoft.com/en-us/Investor/events/FY-2018/Microsoft-and-GitHub-Conference-Call?EventID=25596)
- Gartner – “Gartner identifies the top strategic technology trends for 2021” (https://www.gartner.com/en/newsroom/press-releases/2020-10-19-gartner-identifies-the-top-strategic-technology-trends-for-2021)
- Nvidia – “What is Synthetic Data?” (https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/)
- Azure – “Azure Machine Learning” (https://azure.microsoft.com/en-us/products/machine-learning/)
Ready to discover more?
Contact us and we’ll set up a video call to discuss your requirements in detail.