Although it may seem that way, I think the idea is somewhat flawed. What compressors do is reduce entropy. If they work well, that is.
Dictionary-based compressors replace parts of the input with things they have stored in their dictionary, if that takes fewer bits, according to a scheme that the programmer has decided. Maybe the programmer is intelligent, but the program doesn't do anything special.
Statistical compressors, on the other hand, determine the likelihood of the next symbol that comes in by what they have seen before. You could much more easily associate this with "intelligence" even if they still only follow a very rigid pattern that the human programmer has devised.
But if you think about it, this is a behavior which is very much akin to a gambler playing roulette, who, upon having seen 17 three times, puts his money on 17. After all, 17 seems to be the lucky number. Would you deem this intelligent? Secret tip: A horse with a name like "Lightning" cannot lose.
Maybe PPMD uses a somewhat more sophisticated algorithm, but in the end it is the exact same thing. Looking into the crystal ball.
The only difference (and the difference that decides on the outcome!) to compressing enwik8 is that the input is different. Characters in enwik8 are not random, but are highly correlated, and there is a huge amount of entropy in that text. This is why compressors are more successful than our roulette gambler. Still, they do more or less the same thing. They use some statistical model and if they are lucky, this allows them to correctly predict the coming symbols.
What would be more impressive would be compression in a sense such as you show an image to a computer and it outputs "fat guy in funny clothes making a sad face, bent over a dead girl, that's Pavarotti as Rigoletto". That is much more like the way a human would "compress" that photo.