Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomi
  • Svenska
  • English
  • Kirjaudu
Hakuohjeet
JavaScript is disabled for your browser. Some features of this site may not work without it.
Näytä viite 
  •   Ammattikorkeakoulut
  • Lahden ammattikorkeakoulu
  • Opinnäytetyöt
  • Näytä viite
  •   Ammattikorkeakoulut
  • Lahden ammattikorkeakoulu
  • Opinnäytetyöt
  • Näytä viite

Hadoop Performance Evaluation In Cluster Environment

Belay, Fitsum (2017)

 
Avaa tiedosto
Belay_Fitsum.pdf (1.305Mt)
Lataukset: 


Belay, Fitsum
Lahden ammattikorkeakoulu
2017
All rights reserved
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2017112618180
Tiivistelmä
With the growth of the internet a huge amount of data is being roduced every second. Companies rely on data analytics to expand their business and to stay competitive in the market. Over time the technologies of big data analytics have become more affordable for small companies.Unfortunately,small companies usually find it difficult to make the best use of the resources due to wrong assumptions about big data or because they are unable to meet the infrastructural requirements big data analysis involves.
There is a general assumption that big data is only for big businesses, which is not true. Companies usually unable to use the existing infrastructure to implement big data analytics and consequently fail to use an opportunity for growth. The purpose of this study was to encourage small companies to consider big data in their expansion strategies by showing them how big data analytics assists business, using the existing infrastructure.
One of the objectives of this thesis was to evaluate the performance of Hadoop cluster interms of input-output (I/O). This test gives a preliminary idea of how fast the cluster performs in terms of I/O and data throughput. The performance can be measured by feeding different sizes of data sets and changing the number of datanodes in the cluster. Throughout the whole process, Hadoop core components and were investigated.
According to the results, the performance of a multi node cluster in terms of average throughput is better than that of a single node Hadoop It can be concluded that even with an inexpensive infrastructure, by optimizing the existing resources, it is possible to process large volumes of data.
There are different factors that affect the performance of a cluster. These factors include the number of the files the cluster deals with and the processing power of the nodes. However, the network and hardware factors that might degrade the performance were not considered in this thesis.
Kokoelmat
  • Opinnäytetyöt
Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatKoulutusalatAsiasanatUusimmatKokoelmat

Henkilökunnalle

Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste