Is it weird for a DBA to be interested in Data Science or even want to change their career to become a Data Scientist?
Let’s say you decide to jump in and learn Data Science today – Where do you even begin?
Turns out – I am not the only one in this boat. A quick search on Quora said many DBAs contemplate getting into Data Science career.
Now that, we got out of the way. Let’s get you started on Data Science.
What is Data Science?
Data Science is:
- Extracting useful information from Data
- Using this information to drive business outcomes
Basic Elements of Data Science
- Data Sets
Entities are the person or the thing that we are going to research about. Entities are directly derived from your Data Science project goal.
Project Goal 1: Determine what will be the customer churn rate of a company in August 2017.
Project Goal 2: Who will win the US election in 2020?
Entity: Presidential Candidates
If entity is a thing we are to research about, then characteristics are the attributes of THAT thing.
If your entity is a customer, his characteristics or attributes are things like age, gender, income, interests etc.
Also we don’t care about all of his characteristics. We are concerned only about characteristics that are relevant to our project goal.
For an Insurance company, driving history is relevant of their customers. It is not so relevant for a company producing food products.
An event is a situation in which entities participate.
– It could be a simple email offer
– It could be a sales call
– It could be a free seminar where a company’s product is demonstrated. And so on.
It’s pretty straightforward. Entities function in multiple environments.
Examples: Home, Street, Store, Website etc.
Behaviors are actions that entities perform.
Examples: Browsing, Reading, exercising, running etc.
There is always some sort of outcome at any event.
Example 1: In an event where a customer receives an email offering a chance to buy a product, the outcome is whether the customer buys the product or not. This type of outcome is called boolean outcomes.
Example 2: In an event where a customer stands on a weighing scale his weight is the outcome. Yes, the outcomes can be continuous values.
Relationships tell you how an entitity’s attributes influence certain outcomes. These relationships need not be true 100% of the time.
In winter (environment) a person (entity) is way more likely to buy a sweater or a jacket than in summer.
Example 1: Temperature ⬇️ Sweater Sales ⬆️
Example 2: Cars age ⬆️ Maintenance cost ⬆️
Example 3: Fed interest rate ⬆️ home mortgage Interest ⬆️
Its just a system record of all the above. It could be stored in a database, file, etc. ???
You (the customer) shopping (the behavior) in an Apple Store (the environment) on Thanksgiving (Event) and deciding (behavior) to make a purchase (behavior) is an ideal Data Science case . When apple releases a new iPhone their sales goes up (relationship). Your transaction is recorded in their system with most of the above details (observation).
There is 3 kinds of Data Set.
- Structured Data
- Database Record, Application form, Etc.
- Semi structured Data
- Email message with a mix of structured and unstructured data.
- Unstructured Data
- Tweets, Facebook comments, chat conversations, log file output etc.