Methodology

Measurements were conducted in a manner that is statistically valid, repeatable, technically consistent and providing absolute comparability of KPIs for all networks and technologies. All KPIs measured in tests are defined in ETSI standards and ITU recommendations.

The scope of this year’s project is the biggest so far in the seventh-year history of benchmarking of the Serbian mobile operators done by RATEL.
It included drive test, hotspot locations and railways, to the extent that has never been done before in the Serbian market.

The map of covered routes is presented in Figure 1.

Categorisation applied is fully compliant with ETSI 103 559 Annex A document.

The map of covered routes is presented in Figure 1.

Drive Test

The campaign covered 55 cities/towns and 17,000 km of Serbian roads and was divided into area type categories: Big Cities, Medium Cities, Small Cities and Highways, Main Roads, Rural Roads. The categorisation is fully compliant with the ETSI 103 559 Annex A recommendations. The number of tests samples collected during the drive tests (over 9000 voice calls and around 7000 of all data service tests were done in each network) allows for the accuracy of above 3% for standard error with confidence level better than 95%.

Big Cities:

Beograd
Kragujevac
Niš
Novi Sad
Subotica

Medium Cities:

Bor
Čačak
Jagodina
Kikinda
Kraljevo
Kruševac
Leskovac
Loznica
Novi Pazar
Pančevo
Pirot
Požarevac
Prokuplje
Šabac
Smederevo
Sombor
Sremska Mitrovica
Užice
Valjevo
Vranje
Vršac
Zaječar
Zrenjanin

Small Cities:

Aleksinac
Apatin
Aranđelovac
Bačka Palanka
Bačka Topola
Bečej
Bujanovac
Ćuprija
Gornji Milanovac
Inđija
Knjaževac
Kula
Lazarevac
Mladenovac
Negotin
Obrenovac
Paraćin
Preševo
Priboj
Prijepolje
Ruma
Sjenica
Smederevska Palanka
Stara Pazova
Temerin
Vrbas
Vrnjačka Banja

Measurements were performed in drive test mode, which means that the measurement equipment was installed in moving vehicles. Measurement equipment collects network data by running voice and data tests and by using a scanner to obtain radio network parameters. All three mobile networks were measured at the same time and on the same drive test routes, by using the same smartphones – Samsung Galaxy S21+ for all voice tested scenarios. The Samsung Galaxy S21+ was equipped with operators’ firmware to support all latest networks features (including VoLTE – Voice over LTE calls). Samsung S21+ is a Cat.20 mobile device, which means that it supports 4G (LTE) data speeds up to 2 Gbps for data receiving and 200 Mbps for data sending. All smartphones worked in automatic technology selection mode. To reflect the latest technical developments in the mobile networks and to examine the benefits from the available capabilities, SIM cards with the most comprehensive mobile tariff plans (tariff plans with the highest data rates, highest number of minutes, largest amount of data volume) available from each operator were used.

Mobile tariff plans used for network testing are shown in Table 1:

Operator	Tariff plan used for voice tests	Tariff plan used for data tests
Telekom Srbija	SOKO Max	SOKO Max
Yettel	Total Data+	Total Data+
A1 Srbija	NeoNeo	NeoNeo

Table 1. Mobile Tariff plans – Drive test

The measurement system consisted of two test cars equipped with identical measurement equipment (SwissQual SmartBenchmarker) capable of measuring all network technologies and services simultaneously with a very high accuracy. In order to perform voice tests, the Samsung Galaxy S21+ smartphones permanently called each other, within the same mobile network. In Big Cities, voice tests were executed in mobile to mobile scenario between two cars, in Medium and Small Cities and on the Roads mobile to mobile calls were performed within the same car. This particular setup in Big Cities allowed for an effective data collection without performing too much voice traffic within a single radio cell in areas where higher mobile traffic is expected. Voice tests assess network accessibility, retainability and quality of speech. Voice calls with 85 seconds call duration were measured during the benchmarking.

Test equipment installed in cars is presented in Figure 2.

Fig. 2. Test Equipment installed in cars

The receiving or sending of additional data during the voice test call was added to the measurement scenario to simulate behaviour of a regular subscriber using a smartphone device, background data transmissions being typical during the voice call. For each of the voice calls, the quality of speech samples were measured (MOS – Mean Opinion Score) by using the standard POLQA P.863 algorithm.

Data tests were performed by Samsung S21+ smartphones and a dedicated measurement server located at the Serbian Open eXchange (SoX) in Belgrade, which ensured a fair transmission path to all three mobile networks. Data tests assess network availability, stability, typical performance and the highest capabilities. The most representative data services measured during benchmarking were:

Small file Transfer - Download (throughput of a 3 MB file transmission over HTTP protocol from the measurement server to the smartphone). The small file transfer Download test is designed to measure network responsiveness and to simulate a user downloading small files such as pictures, mp3 files or email attachments.
Small file Transfer - Upload (throughput of a 1 MB file transmission over HTTP protocol from the smartphone to the measurement server). The small file transfer Upload test is designed to measure network responsiveness and to simulate a user uploading small files such as pictures, mp3 files or email attachments.
Web Browsing Static (testing how fast the reference ETSI Kepler web page is received and opened on smartphones).
Web browsing Live Page (testing how fast the real web pages are received and opened on smartphones).
YouTube (testing the quality of live stream video transmission).
Ping (measuring the delay between sending and receiving packets inserted by the network).

To simulate the behaviour of an average mobile subscriber in Serbia surfing the Internet, a set of websites was tested, based on their popularity amongst Serbian users:

instagram.com
tiktok.com
kupindo.com
oglasi.rs

In the YouTube test, the quality of a live stream was measured (VMOS - Video Mean Opinion Score) by using the standard J.343.1 algorithm.

We have performed testing of WhatsApp messaging, too. A pair of Samsung Galaxy S21+ terminals were included in the measurement systems. The sending and receiving were served by separate terminals. The target of such setup is to collect end user experience with WhatsApp messaging service over mobile networks.

A scanner (Rhode&Schwarz TSME) was used to test radio parameters of the mobile networks. SwissQual NQDI software was used for network data post-processing and reporting. The post-processing activity was supported by the use of the Systemics-PAB proprietary Data Warehouse for a customized analysis.

Walk test

The campaign was divided into two parts as per ETSI 103 559 Annex A document. Hotspot measurements covered locations in Belgrade, Novi Sad and Nis. Railway corridors measurements included 1600 km in both directions of main Serbian railway routes. Corridor X and Corridor XI.

Measurements in walk test locations were performed in a nomadic mode, which means that the measurement equipment was installed in a backpack carried by the measurement engineer.

Measurement equipment collects network data by running voice and data tests and using a scanner to obtain radio network parameters. All three mobile networks were measured at the same time and on the same walk test routes while using the same smartphones – Samsung Galaxy S21+ for voice, WhatsApp, and data services. The Samsung Galaxy S21+ was equipped with operators’ firmware to support all latest networks features (including VoLTE – Voice over LTE calls). Samsung S21+ device supports 4G (LTE) data speeds up to 2 Gbps for data receiving and 200 Mbps for data sending. All smartphones worked in automatic technology selection mode. To reflect the latest technical developments in the mobile networks and to examine the benefits from available capabilities, SIM cards with the most comprehensive mobile tariff plans (tariff plans with the highest data rates, the highest number of minutes and the largest amount of data volume) available from each operator were used.

Mobile tariff plans used for testing networks are shown in Table 2:

Operator	Tariff plan used for voice tests	Tariff plan used for data tests
Telekom Srbija	SOKO Max	SOKO Max
Yettel	Total Data+	Total Data+
A1 Srbija	NeoNeo	NeoNeo

Table 2. Mobile Tariff plans – HotSpots

The measurement system consisted of a backpack (SwissQual Freerider III) equipped with identical measurement terminals capable of measuring all network technologies with very high accuracy. In order to perform voice tests, the Samsung Galaxy S21 smartphones permanently called each other, within the same mobile network. Voice tests were executed in mobile to mobile scenario between terminals in the same backpack. Samsung S21+ was used to perform the data tests. The same measurement scenarios as in the drive test were used for hotspot measurements.

A scanner (Rhode&Schwarz TSME) was used to test radio parameters of the mobile networks. SwissQual NQDI software was used for network data post-processing and reporting. The post-processing activity was supported by the use of Systemics-PAB proprietary Data Warehouse for a customized analysis.

Test equipment used in Walk Test measurements is presented in Figure 3.

Test Equipment used in Hotspot measurements

Fig. 3. Test Equipment used in Walk Test measurements

Walk test measurement results are a part of the ETSI 103 559 Annex A scoring methodology and are considered for the final score calculation.

Hot Spot

Hot spot measurement included outdoor and indoor location. The locations covered were chosen based on the criteria of popularity and public interest.

Belgrade

Ada Ciganlija Lake
Airport Nikola Tesla
Belgrade Waterfront
Beo Shopping Mall
Galerija Shopping Mall
Knez Mihajlova & Kalamedgan
Ušće Shopping Mall

Novi Sad

Novi Sad pedestrian zone
Promenada Novi Sad

Niš

Niš pedestrian zone

Hot spot measurement results are a part of the ETSI 103 559 Annex A scoring methodology and are considered for the final score calculation.

Scoring Methodology

The scoring methodology was developed for assessing user perception of voice, data and video services after the benchmarking measurements, with the purpose of assessing the ranking of the measured mobile network operators.

The main challenge is to select Key Performance Indicators (KPIs) which represent true user experience, and to develop the algorithm which allows the calculation of the unified quality metric for each operator – the score. In the ETSI 103 559 Annex A approach, the score reflects user experience and focuses on issues which have high impact on user dissatisfaction.

For each service tested, there are at least two metrics identified which meet the suggested scoring concept.

The usability of the service is assessed using Call or Session Setup Success Rate (CSSR for voice tests or SSSR for data tests). CSSR/SSSR is an indicator tested for all types of the services and it is considered as a ratio of successful tests setup to all test attempts. The exact method of calculation is presented with the formula below:

CSSR/ SSR =	All successful setup attempts for the service
	all test attempts for the service

The second metric is the quality of the given service itself. This metric will depend on the type of test executed. For voice calls it is assessed as the speech quality, while for data tests it is assessed as the data throughput, session time or video stream quality in case of YouTube tests. For services where it is important to have a consistent quality of service, the usability of the service is additionally assessed by calculating the percentage of tests or samples with bad quality (voice samples with a MOS <1.6 for voice tests, data transfer tests with a throughput below the minimum expected value). This approach reflects how the network is dealing with the consistency of the quality (the distribution of the quality of service should be close to an average value).

MOS stands for Mean Opinion Score for the quality of voice services. It measures subjective perception of the voice quality by the listener. It ranges between 1 and 5, with 5 being the best.

For each of the KPIs measured, there is a threshold assumed as the minimum requirement which has to be met in order to achieve scoring points. This threshold is set at the level representing the status of the technology development (minimum data throughputs meeting assumptions for the implementation of the 3G or 4G technologies), the usable quality of service (MOS threshold below which speech quality is poor), network accessibility levels, corresponding to the grade of service used for the telecommunication services.

The quality KPIs also have maximum values used for scoring calculations, which is interpreted as the target above which users do not feel a real difference in the quality of service, and therefore, it is not a differentiating factor in the user’s perception. Operators are not awarded with additional points for exceeding the maximum threshold.

For all KPIs, a linear function is used to calculate how a given KPI meets the expectations (within min-max ranges). If a higher value of the KPI is better, the particular score is calculated straight forward (i.e. speech quality or throughput), if a lower value is better, calculations are performed in a reversed way (i.e. call setup time or share of samples with low throughput).

The data gathered during the benchmarking tests are used to calculate ranking scores for all measured mobile operators for voice and data services.

The importance of the voice service is set to 40%, while all data services sum up to a total of 60%. Details are presented below.

Voice services, affecting 40% of the overall score
Web browsing, affecting 22.80% of the overall score
Data services, affecting 15% of the overall score
YouTube, affecting 13.20% of the overall score
Messaging, affecting 9% of overall score

The first step for calculating the single score is to calculate the score for the different aggregations as:

Cities:

Big Cities
Medium Cities
Small Cities

Roads:

Highways
Main Roads
Rural Roads

Complementary Areas – Walk test:

Hotspots
Railways

Particular area scores are calculated separately by using the defined weights and thresholds for different types of services tested. The final score is a weighted aggregation of the scores obtained in these areas. The area scores contribute to the final scoring with weights reflecting their importance. The following weights are used for the aggregation Cities: Big Cities 60%, Medium Cities 30% and Small Cities 10%. The following weights are used for aggregation Roads: Highways 60%, Main Roads 30% and Rural roads 10%. For complementary areas, the weights are as follows: Hotspots 60% and Railways 40%. The final score is evaluated based on the already calculated weighted scores for aggregations. The final score inputs are weighted as follows: Cities 50%, Roads 40% and Complementary Areas 10%. The aggregation over areas and services is shown in figure 5.

Fig. 5. Aggregation over areas and services

The final score, the same as the scores for each of the individual areas, is presented as a single percentage value which can be interpreted as the level of fulfilling the expectations/capabilities. It ranges from 0 to 100 percent.