Pune Effect: The accuracy of stats

I’ve talked many times in my end of month traffic reports about the ‘Pune’ effect. I have no particular great intention of this post ranking but offer it as an explanation behind categorising for freak statistical events. There are good and bad events as you will soon see.

The explanation of the term ‘Pune’

Pune is a location in India and it was the first time I noticed some pretty unusual stats culminating in a spike in views which elevated one month far beyond others previously.

Pune Effect

Pune, Maharashtra, India

I drilled into the data to understand what was going on and discovered that one individual visitor had viewed 75 of my posts in the space of 3 hours. This happened only once this month and was generated off the back of commenting on a very high authority article.

I later saw these kinds of odd spikes from time to time and monitored them with interest. Uncharacteristic behaviour is the kind of data you stop and take note of.

Types of event

If you are a stat watcher like me, you desire to know whether the event was organic or synthetic.

Organic events with a spike are great. That means you’ve more than likely got an interested party checking out your stuff and putting their thumbs up (or at least viewing your work as copy worthy which normally means it’s a winner). You want winning posts and this is a good indication that you have them.

Synthetic events are the kind of issue you want to avoid. This is because you know that there is a correlation between artificial trawls and increased bounce rates and low time on site.

Bots – Automated crawlers

Bots are okay in the background. It is true that most of the time many big stat collectors know to filter out this kind of background traffic. If you ever look at stats on the server side such as AWstats or Webalizer (stats common to hosted sites normally found on the cPanel), you will see lots of additional bot traffic happening in the background. These unusual events begin to be a problem when they start bleeding into your normal stats.

Obviously, you want your site to be listed and crawled by various stat bots, but you don’t want that data to leak into your stats and create harm that you will be penalised for by Google and other search engines.



High bounce rate constitutes as a red flag for Google & Co.

You have the uncontrollable events of visitors who may stay for a few seconds but be so underwhelmed that they leave. This is organic loss. You can stand a chance of repairing this by making sure people who need the information find it and those that don’t steer clear. It takes a bit of knowhow but can be done.

  • Remove 404 page navigation errors
  • Make sure the site doesn’t take too long to load
  • Improve your proofing (spelling, grammar, readability)
  • Be relevant
  • Don’t use traffic shortcuts

On the other side you may be targeted by bots from rather shady corners of the web. In some cases you can use a bit of data knowledge to block these activities.

One such company I tackled in 2014 were Ukrainian Semalt.com. Unfortunately this company uses a rather invasive method of pinging your site which results in a daily number of high bounce rate views. I used a method of blocking to aid me in this and then created a guide, I didn’t expect such a big take up but was surprised at how many other site owners and stat fans are suffering the same problem. Of course, a semalt.com community manager via Twitter very quickly provided a seed removal tool link, but a guy was quickly on her back about Semalt.com’s less than ethical methods of market research.

If you are interested in the more brutal solution, click this paragraph link to find out how to remove Semalt from affecting your stats permanently.

The search engine will mark you down and drop your SERPs potential (Search Engine Results Page). It is a known fact that very few people look deeper than the 3rd results page. If you are already facing harsh competition, these synthetic detractors will not help your cause.

Synthetic Violation

It sounds a bit wrong doesn’t it! It doesn’t involve being tied to a bed by a robot however (thankfully). Jests aside, synthetic violation is a problem and one that we as a community, have to band together to fight.

There are two main types of synthetic event:

  • Indirect
  • Targeted

Indirect violations

Indirect events are normally of short duration and just come across as a blip on the radar. Normally you can isolate these events quite easily and they don’t do a great deal of harm in the long-term. Sometimes they can come about as a change in algorithms or some new services that have started up and look to crawl the net. In isolated events these are okay, in numbers, these events can be a problem. Often there is not a whole lot you can do after the event, you just have to ride the storm and make sure your site and stat collection can be filtered effectively.

In statistics there are always some odd numbers that don’t fit a pattern.

Targeted violations

This is the worst violation because someone has actively targeted your site. Such triggers can be site optimisers who trawl your site synthetically with their bots then mail you suggesting what needs work. They do this especially on hosted sites because they know that you might have money you can invest in such work. There are many operators with similar modus operandi, crawling your site to try to co-opt you to spend money.

Let me just say that this is bad market practice. Cold calling is not the best way to drive sales. Reputation has a much higher value.

If I want something doing, I’ll go and find a solution provider that looks good, I might even ask someone who has needed that service who they went with and whether they got a good service/value for money.

Putting that in simpler terms:

I don’t like it when someone runs a finger across my dirty car and then knocks on the door with a bucket, soapy water and a sponge in expectation. (not that this happens).

I don’t mind receiving a leaflet through my door saying that there is a friendly car washer working in the area. (this is the norm). There is a chance I might take up this offer if I can see other cars washed well.

Targeted violations can occur by command or through systematic deduction. Sayeth what?

By command

The violator may isolate a set of prospects through automatic means (aggregation) then manually give the go code. This means that a trigger man (or woman) is involved on a personal level when processing the eventual response.

By systematic deduction

This is very much like the above but entirely automatic, removing the humanity from the loop altogether. Depending on where your domain sits in terms of niche and what letter the domain name starts with will determine how soon you get hit with this violation. Everything will be dealt with automatically including your interaction with the company in most cases.

In conclusion

Such violations as mentioned above don’t happen often but when they do, they can be a source of pain if you are an avid stat-watcher. They can give you a false sense of how well you are doing. The issue we have is the gullibility of some site owners saying ‘yes’ to bad practice. This makes it bad for everyone because, spurred on by success; these amoral companies will amp up bad marketing practices. The reality is that for every dodgy bit of marketing that drops through your inbox, there is a very high probability that someone has been a mug and said ‘yes’.

The Pune Effect is just a fun name I use to attach to unusual events. It is when those events become the norm (i.e. an uplift in quality traffic and time on site) that I smile.

What are your Pune effects?

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha * Time limit is exhausted. Please reload CAPTCHA.

CommentLuv badge