Art Critic: An automatic painting classifier
Art Critic is a program that can be trained to identify a color image of a painting as being an example of a given style of art. We trained the system to distinguish Impressionist and post-Impressionist art from other periods, including Cubist, Surrealist, and Baroque. The system was only moderately successful, and suffered from several limitations in its design.
Classifying artwork requires relatively high-resolution renderings, resulting in raw data of very high dimensionality. In addition, much of the relevant information is local detail and texture.
We used a three-stage classification system, designed to reduce the volume of the data without losing useful features and the local detail.
Data and Training:
Our data set consisted of 212 jpeg images of paintings by nine well-known artists, representing the Renaissance (Michelangelo), Baroque (Rembrandt, Rubens), Impressionism and post-Impressionism (Monet, Renoir, Cezanne, Van Gogh), Cubism (Braque, Picasso), and Surrealism (Dali). About half of the images were from Impressionists and post-Impressionists. Most images were between 500x500 and 1000x1000 pixels.
The neural network classifier was trained with 16800 labeled segments (the 100 largest segments from each of 168 images). 90% of the segments were used for training while the other 10% were held out for validation against overfitting. The remaining 44 images, never seen while training, were used to evaluate the full classification system.
Segmentation: Image segmentation results were frequently ugly, especially on paintings with highly varying colors. Using segmentation instead of a simple color histogram was probably not helpful in these cases. On other images, the segmentation captured many human-recognizable features and was likely a constructive processing step.
Above: Van Gogh self-portrait with noisy segmentation
Classification: The neural network design went through three evolutionary stages as we tried to improve its performance:
There are several basic problems involved in classifying impressionist works:
1) Most impressionist works use a great deal of color. Because they are frequently of natural scenes, the color is often predominantly green or blue. Because of this, the classifier tends to mark any blue or green region as being impressionist, and any dark region as being non-impressionist. This is fostered by the fact that a region classifier has little to go on - the six color features are vital for classification. It tends to result in a binary red/green classifier, however. This would be alleviated by considering multiple regions.
Even taking multiple regions into consideration, color is not sufficient to disambiguate a painting's stylistic category. Many painters used palettes that could belong to either category. Braque is a good example of this:
2) Most impressionist works have soft lines, making the breaking of the
image into regions difficult. Because the segmentation algorithm is
limited to a certain sensitivity, the tendancy in such a case is to end up
with either an image with many small, uninformative regions (high merging
threshold), or one large, uninformative region (low merging
threshold). The blurrier the lines, the narrower the good cross-over
This could potentially be a useful feature, but quantifying 'line softness' is an iffy matter at best.
3) Most impressionist works are heavily textured. This has a similar effect on the segmentation as (2), but with the additional difficulty that many other periods also use heavy texture. Cubism, in particular, uses a similar style.
4) Impressionism is basically a context-driven distinction - each "distinguishing" feature could also be used to distinguish some other group. Indeed, it is not the features themselves that make a painting "impressionist" - whether it is impressionist is defined by the difference between the thing represented and the representation. Such a distinction cannot be practically captured. Consequently, any categorizing algorithm will be more a heuristic than anything else.
The third problem is actually one of the major reasons that it is advantageous to both initially segment the image and then merge compatible regions: without such merging, the regions resulting from a highly-textured impressionist work and a smooth traditional painting are comparable. It is only by combining regions that you get large tell-tale regions in the non-impressionist paintings while leaving a number of smaller regions in the impressionist paintings. It also prevents anomolous areas from gaining too much weight.
These problems make a good balance of features particularly important. There should be enough line information to help offset the overweighted color information, and enough regions considered to prevent a painting being classified by an anomolous region.
The only line type features our algorithm considered were boundary length
(length around a region), and average curvature (higher curvature being
The first feature did little good because as well as considering the major regions, we also considered smaller regions to allow more data. For major regions, boundary length may give a good indication of painting type, because impressionist works tend to end up with more, smaller regions, but the majority of regions analyzed would fall in the medium or small range - more or less the same regardless of type.
The average curvature was a potentially useful feature; the main problem with it was that it did not consider wide enough a window in making its curvature judgements. To properly recognize the difference between these two line segments
it is necessary to consider a window of at least two pixels in every direction. However, this becomes intractable very quickly.
Combinations of regions:
For these reasons, to get decent results for an overall picture, it would
be necessary to consider a number of regions at the same time. The
simplistic difficulty with this is that the number of regions vary, so
there has to be at least some degree of seperation between the image
analysis and individual region analysis.
We tried two voting-type techniques for taking multiple (5-10) regions into account at the same time, with limited success.
The first, a simple voting scheme, worked moderately well, boosting the accuracy by an average of 5 percent. The obvious problem with this is that it gives all results equal weight, when in fact some might be far more important than others.
In an attempt to solve this problem, we linked several evaluated regions together using another neural network. This actually ended up giving worse percentages than the regions taken individually. Here the problem was probably lack of data - because our limited training data was limited even more by analyzing several regions together, there was insufficient examples from which to extrapolate a reasonable rule.
Although the end algorithm was only moderately successful, given more data
it should not be difficult to get a good classifier. The most notable
limitation of our implementation lay in the fact that we mostly considered
regions individually. On the regional level, it is very difficult to tell
the difference between a painting in one style and a painting in another -
many of the components are very much the same. This painting may have this
nice little green region - it doesn't mean that the painting is
impressionist. To effectively solve this problem, a network combining most
of the region results would be necessary. To make a good classifier, it
would probably require the combination of 50-100 regions.
Although this would require a huge amount of data to train properly, the initial findings we have shown here suggest that it could be quite successful. Such an extension could also benefit from further features, but this might not be feasible, since the more complicated features often take an impractical amount of time to derive.