STATISTICAL COMPUTING

Statistical Computing.

I want to ask you to help me with these issues from a statistical computing course for undergraduates so I can learn the answers. I incorporated the task. And R must be used for this operation.
SQL and relational databases in Assignment 5
MySQL Code
You must query the database using SQL statements. Instead of taking back a lot of data to R and then executing computations there, you should try to perform as many of the computations in SQL within the database. After an initial SQL query, there will be times when it is substantially simpler, or only possible, to perform some of the computations in R. Before resorting to performing computations in R, you should make every effort to develop the most thorough SQL query.
R Code
Implement questions 9, 10, 11, and 12 using only R commands and the dbReadTable() method to extract a whole table after completing the questions using SQL instructions. (If the bulk of the data presents a challenge, you can work with a sizable subset of each table.) Demonstrate the R code and confirm that you obtain the same outcomes using the SQL instructions.
The Information
We will investigate the IMDB, which is accessible both online at imdb.com and offline via the Alternative Interfaces. We are limited in what we can accomplish with this data because it is virtually a whole population and because there aren’t many insightful or fascinating questions because the majority of the variables are qualitative rather than quantitative. (We would appreciate more information, such as sales figures, customer feedback, audience ratings, etc.) We will utilize it to investigate SQL, though.
You may access the SQLite database here.
When learning SQL, it can be fascinating to investigate additional datasets.
• Lahmans Baseball Database is excellent (if baseball is your thing). (Data Science in R: A Case Studies Approach, which is available electronically through the UC library on campus and off campus, contains a chapter on this.)
• The users, questions, and answers on StackOverflow. Information can be found at https://archive.org/details/stackexchange.
The schema for the database’s tables is displayed in the following picture.

The raw text files were converted to a database using the imdb-to-sql tool, which makes use of regular expressions that we studied for the last project. Many thanks to Ameer Ayoub for providing the code that enabled the database to be built from the raw files.
Note that when I refer to “actor,” I mean both male and female actors, and when we refer to “movie,” we do not mean TV series.
Questions
1. The database contains how many actors? How many films?
2. How far back does the database go?
3. What percentage of male and female actors are there?
4. How many entries in the table for movies are actual films, as opposed to television shows, etc.?
5. How many different genres exist? What is the name or description of them?
6. List the ten most popular movie subgenres, along with how many films fall into each category.
7. Look up every movie containing the word “space.” What number are there? What year(s) were these published? And who starred in each of these films as the top five actors?
8. Has the volume of films in each genre increased or decreased over time? Plot the total number of films produced each year over time and by genre.
9. Which actors have appeared in the most films? Mention the top 20.
10. Which actors have had the most “top billing”—that is, being listed as 1, 2, or 3—in movies? Include the years that these films were released for each actor?
11. Who were the top ten actors in terms of the number of films they appeared in each year? What are the actors’ names, the movies they starred in, and the year they appeared in them?
12. Which ten actors (as listed in the aka_names database) have the most aliases?
Networks: 13. Choose a (main) actor with at least 20 film credits. Look up every actor who has shared a screen with that individual. Find all the people who have appeared in a move with each of these. Make a network or graph of who has appeared with whom with this. To display this network, use the statnet or igraph packages.
You may accomplish this with specific SQL commands and then use the R processing of the outcomes to create additional SQL queries. In other words, if there is a more straightforward way to accomplish something in R, don’t waste too much time attempting to develop sophisticated SQL queries.
14. What are the top 10 TV shows with the most movie stars as regular cast members?
Different Database
I constructed a new database using the imdbpy scripts since the gender in the imdb_data produced by imdb-to-sql is incorrect. This results in a different schema—a different set of tables and columns—but a much more comprehensive representation of the IMDB data. The entire database now occupies 8.5 megabytes. It now has 7.5 GB after I deleted several of the tables that are unnecessary for our exploration.
You have the option of using this updated data to improve your results. The databases come in two versions: the full version and the version with 8 tables removed.
Below is a depiction of the full version’s schema.

 

Save your time - order a paper!

Get your paper written from scratch within the tight deadline. Our service is a reliable solution to all your troubles. Place an order on any task and we will take care of it. You won’t have to worry about the quality and deadlines

Order Paper Now

 

"Get 15% discount on your first 3 orders with us"
Use the following coupon
"FIRST15"

Order Now