Machine Learning on Big Data for Personalized Internet Advertising
Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Charles Humble on Jun 25, 2008
To work around this problem a number of static analysis tool developers are exploring using annotations to define these details. For example FindBugs and IntelliJ both define their own annotations to indicate when a method may return null. The two tools use slightly different annotations however and so there is a case for standardisation. JSR-305, led by FindBugs' creator Bill Pugh, is looking to create both a standard set of annotations that analysis tools can use, and a mechanism to allow developers to add additional annotations as they need to. The current proposal includes annotations for nullness, sign, language, and threading amongst others.
In a tech talk Bill Pugh gave at Google he goes through a number of concrete examples, starting with nullness. The idea is to allow a method to define parameters, return values and fields that should always be nonnull, and those whose arguments should accept a null. Pugh's solution involves using three annotations:
@Nonnull - interpreted as should be nonnull after the object is initialised.
@NullFeasible - code should always worry that this value might be null. Tools should flag any dereference that isn't preceded by a null check. If you mark a return value as NullFeasible you may have to go through a number of other code changes as the effects of a possible null parameter ripple through the code.
@UnknownNullness - this is the same as no annotation but is needed to support default and inherited annotations.
Other candidates for inclusion include the sign annotations, which would be used to indicate allowed signs for a numeric value, such as @Nonnegative, @Signed, @CheckForNegative, @Positive and @CheckForNonpositive, a language annotation which allows a developer to indicate that a String is of a specific language (SQL, regex etc.), and threading annotations. The threading annotations would indicate methods which must (or that shouldn't) be called from a specific thread (or thread group). In the Google discussion group for JSR-305 Pugh described this as follows:
"In certain situations, there are operations that one would need to be called from a specific thread, or that should never be called from one. The main and most significant example of that are AWT/Swing threading issues are a major issue with GUI development in Java. Certain AWT/Swing operations must *always* be made in the event thread, or can result in all kinds of painful subtle problems, while operations that take too long which are done while in the event thread will cause the application to feel slow to the user (that is one of the main reasons that swing has gotten a reputation for being slow). In order to help with those issues, I propose the creation of a @ThreadRestrictions, that can be used to specify threads and/or thread groups that the operation should always be called from/never be called from.
Also to make developer's life easier, and since the Swing/AWT issues affect all Java desktop development, I would propose the creation of a @EventThreadOnly @NeverEventThread annotation."
As well as defining a base of annotations, JSR-305 is looking to provide a meta annotation for a type qualifier that would allow a developer to define their own attributes for the Java type system. A significant motivator for this is the fact that the lack of an enumeration type in Java pre version 5 resulted in a large number of Java APIs which used Integers and Strings with public constants where enumerations would have been a better design choice. So a method like the JDBC createStatement takes three int parameters (resultSetType, resultSetConcurrency, and resultSetHoldability) and then exposes a set of public static final int constants for the developer to use - TYPE_FORWARD_ONLY, CONCUR_READ_ONLY, HOLD_CURSOR_OVER_COMMIT and so on. The compiler has no way of telling if you have the right integers going into the method call in the correct order since they are just three integers. By defining a meta annotation for a type qualifier, the JSR provides a mechanism to allow the developer to add a required type qualifier. Returning to the nonnull example from earlier this would be represented as:
@Documented
@TypeQualifier
@Retention(RetentionPolicy.RUNTIME)
public @interface Nonnull {
}
where
@Documented indicates if it should go into the JavaDoc or not
@TypeQualifier tells us it's a type qualifier
@Retention tells us if it is available via reflection.
and the when arguments can be used to provide location specific information.
The JSR is expected to be delivered as part of Java 7 but it will require no language changes and the expert group are aiming to support Java 5 and up. Members of the expert group include Sun, Google, JetBrains and Doug Lea.
Tutorial: Integrating SQLFire with tc Server and Spring Data
RDBMS to NoSQL: Managing the Transition
Banking Case Study: Scaling with Low Latency using NewSQL
Introducing SQLFire: a memory-optimized, high performance SQL database
VMware vFabric SQLFire - Test drive the data management system with memory speed, horizontal scalability and a familiar SQL interface
One of the problems with JSR 305 is the fact that it attempts to enable a single function analysis tool (intra-procedural analysis) to act as if it were performing whole program analysis. It requires the developer to state expected behaviour up front (whether or not that behaviour is actually expressed correctly in the developer’s code).
This point is elaborated more completely at the Klocwork Blog - Kloctalk. Look for the post entitled JSR 305 - Silver Bullet or Not a Bullet at All
I'd be interested in your comments...
Cheers,
Chris
Enough said
Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.
Steve Vinoski believes that actor-oriented languages such as Erlang are better prepared for the challenges of the future: cloud, multicore, high availability and fault tolerance.
Trisha Gee talks about using Java for low latency programming, the Disruptor, an open source concurrent programming framework developed by LMAX, agile management techniques, and diversity in IT.
Antoni Batchelli introduces Pallet, a devops platform for the JVM for provisioning and configuring servers, configuring clustered services, deploying and managing software, servers and services.
Salvatore Orlando introduces OpenStack and Quantum, a project intended to provide network connectivity as a service, covering the current state and expected developments in the future.
Every major Open Source project worldwide has already embraced Distributed Version Control Systems (DVCS), will enterprises be next?
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
Jesper Richter-Reichhelm presents the DevOps integration at Wooga, and how their system architecture has evolved over the years in order to cope with the increasing number of players.
1 comment
Watch Thread Reply