
Measuring AI: Beyond Size and Data
As businesses increasingly look to artificial intelligence to enhance operations, the method of evaluating AI's intelligence has become critical. For years, the standard benchmarks focused largely on the volume of data and the size of the model. However, Jonathan Frankle, the chief AI scientist at Databricks, argues that this perspective is fundamentally flawed.
In the context of emerging agentic AI—the next evolutionary step beyond generative AI—there's a pressing need to measure efficacy based on real-world performance rather than theoretical capabilities. Frankle explains that what matters most is how AI behaves in practical applications. Trust and return on investment stem from performance when AI is deployed, rather than the sheer amount of information it has learned.
Why Traditional Metrics Fall Short
Many enterprises overly rely on public benchmarks, which can paint an incomplete picture of an AI's actual utility. Frankle points out that AI operates differently from traditional software; its outputs are probabilistic, meaning that simply judging the model based on the data it was trained on doesn't offer a comprehensive understanding of its potential or pitfalls.
He emphasizes that AI deployment should be treated with the same rigor as software development. Currently, many businesses adopt a casual approach by running quick tests and evaluating the ‘vibes’ of AI models before deployment. This practice invites risk, and Frankle insists that such a haphazard method shouldn't suffice for something as impactful as AI.
Implementing a Pragmatic Approach to AI Measurement
To truly capitalize on AI's potential, businesses need to prioritize thorough evaluations rooted in their specific contexts. This means conducting rigorous assessments using business-specific data, enabling AI to be refined effectively. By adopting this methodology, managers can ensure that an AI's outputs not only meet general expectations but also align with their unique operational demands.
For small and medium-sized businesses looking to integrate AI technologies, the insights from Databricks offer a pragmatic approach to measuring AI's effectiveness. By focusing on behavior and context rather than raw metrics, companies can build models that are not just intelligent in theory but are also reliable partners in their day-to-day operations.
Write A Comment