QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » Machine-learning Overview

Machine-learning Overview

  • This may not be the type of question to ask on SO, but just wanted to hear what about other people have to say regarding what factors to consider in implementing machine-learning algorithms in a large enterprise environment.

    One of my goals is to research industry machine-learning solutions that can be tailored to my company's specific needs. Being pretty much the only person who has a math background on my team and and who has done some background reading on machine-learning algorithms previously, I'm tasked with explaining/comparing machine-learning solutions in the industry. From what I've gleaned by googling around, it seems that:

    a. Machine-learning and predictive analytics aren't exactly the same thing, so what's inherently different when a company offers predictive analytics software vs. machine-learning software? (e.g. IBM Predictive Analytics vs. Skytree Server)

    b. A lot of popular terminology often gets entangled together, especially regarding Big Data, Hadoop, machine-learning, etc. Could anyone clarify the distinction among those terms? From what I've learned, I think the conceptual separation goes like:

    • Machine-learning algorithms
    • Software Implementation
    • Infrastructure to run software on large datasets (Hadoop)

    c. When implementing a solution, do most companies hire consultants from the solution company to help implement the algorithms, or are most algorithms pre-built and any data analyst can use them? Or do we need a team of data scientists, even with the software, to run the algorithms and understand the output?

    I know this is quite a long-winded question(s), but any info would be helpful. It's kind of difficult being the only person who remotely knows anything about this stuff, so I'd love to hear what more experienced and technical people have to say.

      August 27, 2021 12:52 PM IST
    0
  • Regarding Big data/Hadoop/ML: Big Data is a terminology that defines the essence of data you need to deal with. Mostly, you can define big data vs. "ordinary" one by something that is called 3Vs - Volume, Variety and Velocity. The thresholds that defines "what is the volume necessary for big data" aren't defined scientifically, but rather more on feasibility considerations: if you feel that the amount of data creates large overhead on maintaining regular DB (MySql etc.), then you might consider big data solutions. Hadoop is just the most common tool designed to handle big data.

    Machine learning is subfield in data science that evolved from statistics and computer science. The idea is to let machines learn without explicitly programming it. In a nutshell, the learning method goal is to generalize past data in order to predict new data. Big data and machine learning are mentioned together because the nature of ML techniques that requires data in order to learn. There is a trend towards big data in the industry and the nature in big data requires feeding ML algorithms a lot of data in order for it to learn (unstructured sparse data).

    Most companies hire data scientists in order to deal with this tasks as it requires a lot of knowledge in statistics, computer science, algorithms etc. that regular data analysts don't have. Most of data scientist job is not "running a ready algorithm" and there is a lot of preparing and statically analyzing the data before you even start thinking about the algorithms. You don't need to hire a team in advance but it's a function that can grow gradually over time based on needs.

      August 31, 2021 12:33 PM IST
    0
  • Answering to your C part of the Question, Machine learning has prebuilt algorithms for both supervised and unsupervised methods. To have a solution for an organization we first have to understand the need of the client and before choosing the algorithm first we choose supervised learning or unsupervised learning. if the need is for supervised learning then first we have to do the feature engineering that is very important part of supervised learning, which find the attributes in the subjects that identified them from the rest. Then we choose the classification algorithm or prediction algorithm based on again the problem. For that, we have many algorithms, but choosing the best one, totally depends upon your hardware capacity and data processing capacity algorithm. we have the chart of comparison for that.

    Unsupervised learning is best when we want to identify anomalies in data or we want to cluster the data which has similar attributes.

    Hope this will help you to understand the third part of your question.

      September 1, 2021 1:40 PM IST
    0
  • Machine learning refers to a class of computer algorithms that learn from examples rather than being explicitly programmed to perform a task. It learns to formulate a general rule from a set of concrete examples. ... Machine learning is the basis of artificial intelligence.
      September 2, 2021 4:47 PM IST
    0