# INTRODUCTION anking information is one of the most important issues on the web, bearing in mind that there is no good or bad information. By considering this and by knowing some conventional methods such as the impact factor for ranking scientific journals, following research question centered our interest: does a citationindex for websites make sense? The citation-index works similarly to existing ones but with a special focus on your publication. In other words it does not compare the ranking of your publication to other publications directly it relatively rank how often publications got cited by others. Assigned the same principle for websites a better understanding of referencing websites sharing same interests will occur and deepen the cooperation between websites in order to reach a win-win situation. Unlike the impact factor our citation-index is conceived for the web, so when we speak about other sites we mean websites like ours, which quotes us and adds the link as a reference. This link as we will explain later on is the key for measuring the criteria we set up for the index. Beside the theoretical part we took advantage of the online book called L3T (German textbook about Technology Enhanced Learning) (http://l3t.eu/) to gather the necessary information for making a founded reasoning about the advantages this new index offers. # II. THEORY a) Impact Factor Publishing in scientific journals is very important for the career of a scientist. Choosing the right journal may be crucial for that. Of course there are many journals and they for sure differ in quality which is hard to evaluate. But easy enough, to be found just by doing some bibliographic research and counting the number of citations of articles published in a specific journal. One tool for estimating the relative prestige of journals in a given field is called Journal Citation Reports. JCR is an electronic resource which determines the frequency of citation in total, average as well as the impact factor. The impact factor of a journal is among the criteria considered when candidates are evaluated for promotion [Day, Gastel 2011 p. 30]. # Definition "The impact factor is a measure of the frequency with which the "average article" in a journal has been cited in a particular year or period. The annual JCR impact factor is a ratio between citations and recent citable items published. Thus, the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years (see Figure 1). " However this index has its limitations as well, for example it reflects just the impact factor of the whole journal not of individual articles. It is not interdisciplinary and cannot measure journals of different fields. It is obvious that some journals get a higher rating by counting replies to articles that cite the article in question but not counting them as papers. Editors can increase the impact factors of their journals by publishing good polemical articles early in the year [Hartely 2008, p137]. # A= First the general rules must be set up and combined them together into an equation to form a ranking system. Many of the web analytics systems listed below such as google analytics, piwik or open web analytics are offering all the data which can be tracked from the user (called "raw data") but preview them in no relationship with each other. This was the purpose of our study, to build a system, gather data, analyze them and give conclusions about the possibility of its application. First the web analytics framework is introduced which helps us gathering the necessary data together with a brief introduction to web analytics itself. # c) WEB ANALYTICS The Web Analytics Association (http://www.webanalyticsassociation.org) has proposed a standard definition for web analytics: "Web analytics is the objective tracking, collection, measurement, reporting, and analysis of quantitative Internet data to optimize websites and web marketing initiatives." [Kaushik, 2007 p. 6] Following this definition, collecting data is just one of many functions web analytics can and has to fulfill. The data that are being collected and measured are called clickstream information. Clickstream is foundational data that helps to measure and analyze all kinds of site behavior: visits, visitors, dwell time on site, page views, bounce rate, sources, and more. On base of these data we can analyze the following aspects: There are many business models that use web analytics for their selling and/or promoting purposes. Whether it is an online shop, a blog or some highly specialized financial software that runs on the browser. # 18.] Figure 2 shows the trinity framework; a new way of perceiving web analytics for most efficient data outcome. The goal of behavior analysis is to infer the intent of the website visitors basing on all what we know about them, which is basically clickstream data. The outcome is the result measured in company's predefined objectives, for example if it is an ecommerce website, how many viewers did actually buy the product. But for detailed information analysis and understanding customers' behavior we need profiled web analytics. # i. Web Analytics Frameworks Web analytics tools date back in the early 90s. Since then they have been improving from simple requests counting to highly accurate JavaScript clients, from log files to databases and from text outputs to impressive reporting methods. Besides commercial tools (see Table 1) there are some well implemented open source competitors as well. # ClickTracks ClickTracks provides an innovative line of products and hosted services in the field of web site traffic analysis. ClickTracks focuses on presenting meaningful information about user behavior visually in context. # Coremetrics Coremetrics Web Analytics platform captures and stores all customer and visitor clickstream activity to build LIVE (Lifetime Individual Visitor Experience) profiles that serve as basis for all successful e-business initiatives. # Google Analytics Google Analytics offers free web analytics services with integrated analysis of Ad Piwik gathers all its information from a JavaScript client called the Tracker Code which is anchored into the websites that need to be observed. When a user opens the site it sends the initial information to the server containing browser specification, OS platform, language, forwarding link and so on. After that, the client continues polling information about user activities such as click actions or time spent doing something. On the server side Piwik has a well-developed MVC (Model-view-controller) architecture based on Zend Framework (a PHP optimizing package). The plugins are on the other hand based on the MVC architecture themselves and can be seen as application within the application. From the database we accessed most of the data needed for our new metrics so we didn't need to collect new data from the client. # d) DEFINING THE METRICS The Tracker Code supplies the forwarding external website´s name which linked the user to the website of our interest. The main idea is to find out those external websites that forward the most fitting target group regarding to the website analyzed. In other words, which external website should we set our focus on and is worth invested more time on? There are different parameters how to measure that: First to find out is how many users are coming from a specific website to ours, second how many actions does the user do on our website and third how long did the user stay on our website. The goal of this research work is to combine these three parameters into one formula and visualize the usefulness of references pointing to our website. In general Piwik offers the raw data to build a more complex analysis. The first formula to be applied was intended to show the average values of incoming connections for a given time frame. Since we are working on the incoming references from other website we called it "Reference Factor" (RF). The average reference factor formula is shown in Formula 1 ??????(??) = ??? ?? ??? ?? * ??? ?? ??? ?? * 10 6 Formula 1 : Reference Factor Average Does a Citation-Index for Websites Make Sense? Furthermore a second formula called the "Multiplicative Reference Factor" (RFM) is needed, because we first define the ratio of the website data with the system and then multiply the data in order to set them in relation with each other. The average RF is build up with the average values of the reference website (rw) and those from the system. To build an average value it is important to create a rank, which is based on quality instead of quantity. For example we know that website1 is at the first place and has for example 447 visits over 6 months with 2045 actions and 94789 seconds visit time. But measured in average values website2 with 39 visits and 474 actions 42108 visit time has much more interested users who are willing to spend more time on our website. This tells us that website1 users could be misled or were just lurking but website2 users where certain of the content and found just what they were looking for. The RFM is a measurement scale involving visit time and actions, brought together to build a benchmark. A hugh number of users coming from a website is leasing to a high RF-factor of that site, so all popular sites are always at the top. That's why we have to consider both diagrams for an accurate overview. Both formulas were multiplied with factors of 10 to improve their conspicuity. e) Reporting mechanism 3 shows the implemented widget. It builds a table of reference websites and their corresponding number of visits in the first column, the actions, visit time, RFM and RFA. The table can be browsed and sorted. You can even "unfold" one site and take a look at the link where your tracker is placed. You can even build pie charts or vertical bar graphs within the widget. These graphs were satisfactory for one dimensional values. But the RF-s where compound of many values, so it had to be multi-dimensional. To fulfill that we chose a powerful tool (such as MS Excel) to build the graphs out of three dimensions: actions, visit time and visits. The yearly graph looks like figure 5 # ??????(?? # Proof of concept Having finished the technical implementation we tested the whole concept and prove its capabilities. The project L3T (http://l3t.eu), which is a German text book on technology enhanced teaching and learning, was chosen for that purpose. The project website of L3T already had piwik installed providing enough data for a good analysis and conclusions. In piwik 1.1 the possibility is given to choose between fixed time periods which are daily, weekly, monthly, or the whole year. Of course, the more data you have to process the more accurate becomes the result. So we started with the annual period which rendered us the table as to be seen in figure 3. On base of this table we can get all the information needed to work out a conclusion. If the same information is needed for example for a presentation then the content could be exported and rendered in MS Excel with the supplied macro. The data used for this table is the same as for the simple ranking. The application of the reference factors offers us two new types of ranking which could be similar but don't have to be the same with the simple ranking. # IV. # DISCUSSION As mentioned before, the tables give an accurate data but lack on fast visual interpretation potential; so for the final discussion the excel graphs will be used. Taking a look at the monthly periods we see that those are more dynamic and could reveal information that is smoothened by larger time periods. February 2011 was the first monthly information gathering period to start. It can be shown that popular websites like facebook very soon attracted a large number of users. But it didn't take long for other sites to contribute to the popularity of the project. In fact for the rest of other monthly reports other profiled websites such as those from universities or Wikipedia were running on top of the RFM list. This is because users who were linked through those websites where more interested in the topic which resulted in longer visit times and more actions within the website. Browsing and long time reading means that the user found what s/he was searching and looking for. In March 2011 for example many other websites were represented as big contributors by reaching the right-top side of the RFM graph having larger circles. Although they generated about the same amount of visits during the time, the Does a Citation-Index for Websites Make Sense? quality (actions, visit time) were not always predictable. Wikipedia accords to the yearly period graph at first place in raw data measurement and RFM because it generated a lot of traffic. And a lot of traffic means many users have visited the site and are familiar with its content. They might not have found what they needed but they know what L3T is about and would take a reference on it the next time they would need it. The average RF at the other hand tells us about the interest of the user despite the number of visits. When we look at figure 7 displaying the RFA of June we can see a small dot at the top-right edge of the graph. This dot represents moodle.uni-graz.at at the first place for the monthly ranking. Although facebook with a larger circle has the highest number of visits on average it has a smaller RFA quotient than moodle. Moodle only forwarded one visitor that month but that person was so interested in the page that s/he spent over 40 minutes reading taking over 20 actions, which are far up from facebook's average values. So finally, it must be pointed out that the best way to tell the importance of a site is, if it ranks in the same area in both diagrams. This is for example the case for www.checkpoint-elearning.de (154 visits, 1330 actions) and www.e-teaching.org (164 visits, 1554 actions) in the yearly diagrams (Figure 5 and Figure 6). They have more or less the same amount of visits and actions with a similar ratio between both. For such sites we can draw the conclusion that quantitative and qualitative values are valid. It can also be stated that if one site is positioned in the upper right area of the diagram than it offers interesting potential in our sense. So for sites occupying the same area we can estimate that the assumption we intended with our research question is true, but for the rest of sites the fluctuation is too big to make a clear distinction. Besides the monthly reports there are weekly analysis as well. The amount of data is relatively small for drawing conclusions but is sufficiently meaningful for staying up to date with the newest developments regarding your website´s popularity. It can also be used for history purposes to compare relevant changes. Microsoft Excel didn't offer the possibility to build these three dimensional graphs out of the box where the third one is the number of visits represented by the size of the circle. So the only possibility was to write a VBA Macro (Visual Basic for Applications) to build the graph and make the changes we needed. Some items are underrepresented with a smaller circle size as it can be shown so we set a default representation value for all items smaller than three units. The size of the circle is also adapted to the graphs possibilities like i.e. some site with more the 1000 visits cannot have a circle of size 1000 because it would be too big to be rendered. So the biggest value is divided by the scale of max circle size and the rest is adapted to that value. The Macro is tested in MS Excel 2011 and MS Excel 2010 for Mac and Windows. 1![Figure 1 : Calculation for journal impact factor (Source: thomsonreuters.com, July 2011)](image-2.png "Figure 1 :") ![Brand buzz and opinion tracking ? Customer satisfaction ? Net promoter indices ? Open-ended voice-of-customer analysis ? Visitor engagement ? Stickiness ? Blog-pulse](image-3.png "?") 2![Figure 2 : The trinity diagram. Source: [Kaushik, 2007, p.](image-4.png "Figure 2 :") ![a) PIWIK Piwik is a downloadable, open source (GPL licensed) web analytics software program. As an alternative to services like Google Analytics, Piwik allows you to host your statistics services on your own server, have full ownership and control over the data collected from your visitors. A plugin offers a user interface which is very manageable and easy to use. Does a Citation-Index for Websites Make Sense?](image-5.png "") 1![Figure 1 : Piwik user interface. [Source: l3t.tugraz.at Piwik]](image-6.png "Figure 1 :") 3![Figure 3 : Widget, Pie Chart View](image-7.png "Figure 3 :") ![Figure3shows the implemented widget. It builds a table of reference websites and their corresponding number of visits in the first column, the actions, visit time, RFM and RFA. The table can be browsed and sorted. You can even "unfold" one site and take a look at the link where your tracker is placed. You can even build pie charts or vertical bar graphs within](image-8.png "Figure") 20112![Figure 2 : Piwik RF Widget](image-9.png "2011 DecemberFigure 2 :") 4![Figure 4 : RFA period: year 2011](image-10.png "Figure 4 :") 1NedStatNedStat is a provider of softwaresolutionsandservicesformonitoring websites and reportingon website-visits.OmnitureSiteCatalyst is a hosted applicationthat offers a comprehensive viewof activity on a company's websitethat includes historical (datawarehouse) and real-time analysisas well as reporting. SAS WebAnalytics applies SAS CustomerIntelligence software to online-Words and other keyword-based search advertising. Google Analytics bases on Urchin, which Google purchased in 2005. channels for a complete view on the customer´s interaction. 2) =?? ?? ?? ??*?? ?? ?? ??* 10 2Formula 2 : Reference Factor MultiplicativeThe given shortcuts are explained in Table 2. © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XX Version I 1 2011 December © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XX Version I 2 2011 December Does a Citation-Index for Websites Make Sense? © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XX Version I 3 2011 December © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XX Version I 4 2011 December Does a Citation-Index for Websites Make Sense? © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XX Version I 6 2011 December December © 2011 Global Journals Inc. (US) V. ## CONCLUSIONS This paper focuses the question whether web analytic tools help us to filter relevant web visitors by interpreting their link history. Therefore two new measurement methods to rank the effectiveness of reference websites were set. Furthermore these methods were implemented into an analytics system and by exporting the data complex graphs could be built with the help of external tools. On base of such reports two different factors were calculated. The first one was the multiplicative reference factor which results from bringing raw data in connection with each other and the second one was the average reference factor as an outcome of the average values. The tested example (L3T project) has shown that the final diagrams help to interpret the usefulness of external references to the example project website. Web analytics remain a big field for online-business. Ranking systems will become more sophisticated trying to differentiate real chances to separate from the noise of Internet. * How to Write and Publish a Scientific Paper RDay 2011 Greenwood ABC-CLIE LLC * MEbner SSchön 2011 BookOnDemand, Germany Lehrbuch für Lernen und Lehren mit Technologien * Academic writing and publishing: a practical guide JHartley 2008 New York, Routledge * Indianapolis, Indiana HYPERLINKS digitalenterprise.org, Managing the digital enterprise AKaushik 12.07.2011 Web Analytics Wiley Publishing, Inc 2007 The Thomson Reuters Impact Factor * Piwik Open Source Web Analytics 12.07.2011