# Down to the Bottom – Weights Update When Minimizing the Error of the Cost Function for Linear Regression

The cost function for Linear Regression is Mean Squared Errors. It goes like below: $J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\left ( \hat{y}^{(i)}-y^{(i)} \right )^{2} = \frac{1}{m}\sum_{i=1}^{m}\left( h_{\theta}\left ( x^{(i)}\right ) -y^{(i)} \right )^{2}$ $x^{i}$ is the data point i in the training dataset. $h_{\theta}\left ( x^{(i)}\right )$ is a linear function for the weights and the data input, which is $h_{\theta}(x)=\theta^{T}x = \theta_{0}x_{0}+ \theta_{1}x_{1}+\theta_{2}x_{2}+\theta_{3}x_{3}+\cdot\cdot\cdot+\theta_{j}x_{j}$

To find the best weights that minimize the error, we use Gradient Descent to update the weights. If you have been following Machine Learning courses, e.g. Machine Learning Course on Coursera by Andrew Ng, you should have learned that to update the weights, you need to repeat the process below until it converges: $\theta_{j} = \theta_{j} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_j^{(i)}$ for j=0…n (n features)

In Andrew Ng’s course, it it also expanded to: $\theta_{0} = \theta_{0} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_0^{(i)}$ $\theta_{1} = \theta_{1} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_1^{(i)}$ $\theta_{2} = \theta_{2} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_2^{(i)}$ $\theta_{j} = \theta_{j} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_j^{(i)}$

However, when I first studied the course a couple of years ago, I did get stuck for a little trying to figure out where that exactly came from. I wish someone had give me some more concrete expansion so I could figure it out faster. Let me do that here so you can examine the detailed breakdown and get through this stage quickly. That’s the whole objective of this blog.

Let’s make this less abstract by putting down exact data point with a small sample and feature sizes. Let’s say, we have a set of data with 4 features like below and we only select 3 samples from it for simplicity. $\begin{bmatrix} x_0^{(1)} & x_1^{(1)} & x_2^{(1)} & x_3^{(1)} & x_4^{(1)} \\ x_0^{(2)} & x_1^{(2)} & x_2^{(2)} & x_3^{(2)} & x_4^{(2)} \\ x_0^{(3)} & x_1^{(3)} & x_2^{(3)} & x_3^{(3)} & x_4^{(3)} \\ x_0^{(4)} & x_1^{(4)} & x_2^{(4)} & x_3^{(4)} & x_4^{(4)} \\ x_0^{(5)} & x_1^{(5)} & x_2^{(5)} & x_3^{(5)} & x_4^{(5)} \end{bmatrix} \cdot\begin{bmatrix}\theta_0 \\ \theta_1 \\ \theta_2 \\ \theta_3\\ \theta_4 \end{bmatrix}$

The prediction for each sample is: $\hat{y}^{(1)} = \theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}$ $\hat{y}^{(2)} = \theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)}$ $\hat{y}^{(3)} = \theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)}$

In this case the cost function wold be: $J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\left ( \hat{y}^{(i)}-y^{(i)} \right )^{2} = \frac{1}{m}\sum_{i=1}^{m}\left( h_{\theta}\left ( x^{(i)}\right ) -y^{(i)} \right )^{2}$ $=((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)})-y^{1})^2$ $+((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^{2}))^2$ $+((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^{3}))^2$

To minimize the cost, we need to find the partial derivatives of each $\theta$. The method to update the weights is $\theta_j = \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta)$

Let’s find the derivative of the $\theta s$ here, step by step. From the cost function above, by applying Chain Rule, we have: $\frac{\partial}{\partial\theta_0} J(\theta) =$ $2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{0}^{(1)}$ $+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{0}^{(2)}$ $+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{0}^{(3)}$ $= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_0^{(i)}$

Do the same for the rest of the $\theta s$ $\theta_1$: $\frac{\partial}{\partial\theta_1} J(\theta) =$ $2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{1}^{(1)}$ $+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{1}^{(2)}$ $+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{1}^{(3)}$ $= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_1^{(i)}$ $\theta_2$: $\frac{\partial}{\partial\theta_2} J(\theta) =$ $2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{2}^{(1)}$ $+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{0}^{(2)}$ $+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{2}^{(3)}$ $= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_2^{(i)}$ $\theta_3$: $\frac{\partial}{\partial\theta_3} J(\theta) =$ $2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{3}^{(1)}$ $+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{3}^{(2)}$ $+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{3}^{(3)}$ $= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_3^{(i)}$ $\theta_4$: $\frac{\partial}{\partial\theta_4} J(\theta) =$ $2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{4}^{(1)}$ $+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{4}^{(2)}$ $+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{4}^{(3)}$ $= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_4^{(i)}$

If you are at first unclear about the functions and notations in the courses or some other documentations, I hope the expansion above would help you figure it out.

# A User Story in an Architect’s Eyes

Not too tedious of a process from the user’s perspective, is it? Did I even mention anything about VPN? That belongs to the past generation. If you implement SSO, it can even be faster as the login steps are not needed anymore.

The entire process is secured. Try copying the content of the article out. It cannot be done! Try opening another corp app that you could access through VPN previously. It is not accessible!

In the eyes of an architect, the process above is like below:

What are involved:

• Microsoft Intune

# IT Books that Teach You “Best Practices”

I always love books that not only tells you how to do something, but also emphasize on what important best practices to follow and what good habits to keep. This time I would like recommend two such books.

PowerShell in Depth 2nd Edition

Surcharge Power BI

# Succeed with Speed in Data Migration Projects

If you work in the IT department of an organization or employed by an IT service provider, likely you have been involved in a data migration project, which most of the time has a very tight timeline. The question here is how can you meet the timeline? For example, how can you migrate 10 TB of data (structured and unstructured) from System A to System B within 3 months?

It is often said that to succeed in an IT project, you need three things: People, Process, Tool. In this article, I would like to add some insights into some of the underestimated factors that are essential for a successful migration.

When it comes to migration speed, too many people are asking what software to use to have a fast migration speed as if there is a magic tool that can make it happen in the blink of an eye. The reality is the tool does not matter as much. The more crucial factors are the infrastructure and the migration team.

• Infrastructure. Network latency and bandwidth, Disk I/Os, CPU and RAM etc.. In many of the migration projects I’ve been involved in or learned about, the importance of infrastructure is as often underestimated as it turns out to be the shortest bar of the wooden barrel when all other factors are good. And infrastructure is usually not easy to change compared to other factors. If your infrastructure is not strong enough to support the goal of the project, try to see how you can improve it as much as you can. Otherwise, even if you have a team with superb driving skills, just because the truck you give them is too small to carry the load, you cannot expect the project to complete too soon.
• People. This is intangible but is also often under-appreciated. (Experienced consultants know what I am talking about). An experienced team would know how best the project can be finished with what’s available at hand. Things are always not perfect. Mastery is about making the most of what you have. Following a user guide step-by-step does not cut it in a migration project with a timeline to commit.

To summarize, to succeed with speed, you need the right PEOPLE, PROCESS and TOOL with a good INFRASTRUCTURE.

# Blockchain and Cryptocurrencies 102

If this is the first time you hear about Blockchain or cryptocurrencies (which is unlikely), you may want to look for a real 101 intro on YouTube or just Google it. However, if you have been hearing about it, but would like to know more, this article can give you some directions.

For technical folks, you would want to know how Blockchain actually works. No other demo can show you in a more effective way than the one built by Anders. Check the two links below which lead you to both demo videos as well as the demo environment you can play with.

There is also a must-read – the Whitepaper that declared the birth of Bitcoin: https://bitcoin.org/bitcoin.pdf

For non-tech folks who is considering investing in cryptocurrencies, I won’t be able to give you any advice on whether you should do it or what type of cryptocurrency to invest in and when. I am currently losing money myself, hahaha… But here is some more information for you to explore more:

Find the right wallet for yourself:

Other general info:

# Design with the Mind in Mind

All designs are to serve the purpose of getting the info across efficiently to the audience. Therefore, the Design Principles are derived from how human perceive, process and memorize information.

### Visual Perception

The first aspect to work with is how human visually perceive information presented to them. Human eyes and brains are very powerful in recognizing:

• Features such as: color, value, angle, slope, length, texture and motion.
• Patterns such as: proximity, closure and continuation, symmetry, similarity, common area etc.

With the characteristics above in mind, here are the design principles that we can derive:

• User “pop out” to attract attention. This is to use contrast and emphasis using visual features to make the important elements standout. One of the ways the rotating banner attracts attention is through motion. Usually on a webpage, there is only banner or web part that rotates while other elements stay still.
• Associate and organize items for skippability.

Nielson Norman Group conducted a study of how people read on a webpage by tracking their eye fixation. A common eye fixation pattern, F-pattern, is shown in the picture below. User start from the first row, reading most of the content of row and move down to the next row. However, as they move down, they tend to read less and less of each row. If the information does not appear in the first few lines, it is buried in the page with little visibility to the reader. This usually happens when no clear guidance is provided to the user and the user cannot easily discover the information they need. That is to say, this is a patter to be avoided. More on NNGroup’s latest article. An alternative pattern that indicates good reading experience is layer cake gaze pattern. Users work their way down the page, stopping to glance at each heading. They either read more of the section if it is relevant, or skip the section if they find it not relevant. This helps users locate the information they are after quickly, without them wasting time searching through the page line by line.

On a homepage of a portal, content is usually organized into sections ( or web parts), and the title of each section is clear and stands out. This is to help the user navigate their way through the page and find the information that’s of interest to them.

### Memory

Human memorize information by both short-term and long-term memory.

#### Short-term memory

A research conducted by George Miller (1956) shows that 7±2 is the “magic number” for the limit on human capacity for processing information with short-term memory. This means human can only process 5-9 items at one shot. Another cognitive scientist Cowan (2001) even provided evidence that that 4±1 is a more realistic figure. This means when providing lists and options on the page, it is important to keep the list short. If the list must be long, for example, you might have a list of the 20 announcements from 5 different departments, all important to publish, you can choose to group them into sub groups and organize them under tabs.

#### Long-term Memory

People’s long-term memory can also be leveraged for a good UX design. This is to associate the new things (the portal) to the audience’s existing knowledge, which makes it easier for users to learn and adopt. This is the idea behind some of the design considerations on the intranet homepage. Users can easily discover the elements below even without the “pop out” effect.

• Search Box at the top right corner.
• A Button to access More Info at the top/bottom right corner of each web part.

These elements exist as part of the convention for a webpage. Therefore, there is no need to emphasize them. On the contrary, it is recommended to neutralize them to minimize the distraction, and to make other important elements stand out on the page.

 Miller 1956: http://www.musanim.com/miller1956/

# SharePoint Infrastructure Workshop Topics

I’ve run Infrastructure Workshops for many SharePoint implementation projects over the years. The objective of the workshop is to figure our how the farm(s) will be implemented in the customer’s environment. A few important decisions are made based on the outcome of the workshop:

• Server and storage sizing
• Network placement
• Service account management
• DR strategy
• Farm configuration
• etc.

I am sharing the checklist here in case it helps. 🙂 Feel free to leave your questions or feedback in the comment. # SharePoint Three-Tier Network Zoning Architecture

This topic came when there were policies in some organizations that require SharePoint to comply with the “three-tier architecture” requirement, meaning WFE, APP and DB should be in three network zones, and WFEs CANNOT connect to the DBs directly! It is common to see SharePoint farms servers placed in three zones, but most of the time WFEs are allowed to connect to the DB Server. However, this is not acceptable to the organizations that has the “three-tier” policy in place.

I saw in some projects a proxy server is implemented in the APP zone to bridge WFE and DB. It works. However, that proxy can easily become a bottleneck in larger implementations. So far the most common way of implementation is to place one or multiple reverse proxies in the Web zone as the web layer, shifting the SharePoint WFEs to the App zone together with the App servers.

{update April 2018} What I propose in this post is an alternative using technologies on SharePoint. However, based on practical experiences, this approach is more theoretical than practical. So far, there is no production implementation yet. {/update April 2018}

This alternative approach leverages the Request Management of SharePoint. With that you may say SharePoint actually supports the 3-tier requirement. How Request Manager works is elaborated on TechNet. The design here is to:

• Put the WFEs in the APP Zone so they can connect to the databases directly.
• Put a dedicated request management farm in the Web Zone in between the Load Balancer and the WFEs, with databases in the same zone. This way, the Request Management layer serves as the web layer interacting with user requests. The WFEs in the APP Zone become the content serving components that retrieve data from the databases at the back end, while they are in the secure internal APP network zone.

Q: is it OK to place a database server of the Request Management farm in the web zone?

A: Yes, there is no user data stored in the database, only farm service configuration data. And the firewall rules should be restricting network traffic to the DB server to come only from the Request Management Farm servers.

Q: When the actual WFEs in the APP zone servers content back to the users, do they transfer it directly to the users?

A: No, it responds back to the users through the Request Management farm. This is actually the key part that makes the architecture above possible as users are not able to reach the App Zone.

Q: Should Office Web Apps Server be placed in the Web Zone or the same zone as the actual WFEs?

A: It must be reachable by the end users directly, so it should be more suitable to be in the Web Zone.

PS: The diagram below shows a bad design example that uses a proxy: # Which VM to Blame?

You have a hyper-v host and the disk space is reaching its limit although you have 4TB of storage. Get-VM does not really tell you about the disk space each VM occupies. The space is mainly taken by the VHDs. I have put together a script to output an inventory report on the VMs on CPU cores, memory settings and total VHD size. This will give you an idea of which VMs take the most space on your host.

The PowerShell script can be downloaded from TechNet Gallery: Get Disk Space Used by VMs

Below is an exmple of the report in the PowerShell window: Below is the same report output to the HTML: # Automate it with PowerShell – Search and Replace Strings across multiple text-based files

ileWhen setting up a new demo environment that is similar to your existing ones, you may need to just change one or a few parameters across multiple scripts files. This tool helps you accomplish the task with just one line of PowerShell command.

Function Replace-String {
<#
.SYNOPSIS
Replace-String finds a string of text that matches the criteria across multiple files, and replace it with the specified new string.
.DESCRIPTION
This command searches through a directory or the file specified, and obtain the content of the files with Get-Content PowerShell cmdlet, find and replace matching strings within the obtained content and set the new text as the content of the original file.
.PARAMETER folderPath
Accepts one directory path. If specified, all the files within the folder (not including the subfolders) will be in scope for the text search.
.PARAMETER file
Accepts the path of one or more files, e.g. "F:\temp\test\profiles.csv"
.PARAMETER oldString
Accepts a string, Regex supported. This specifies the target string to find and to replace.
.PARAMETER newString
Accepts a string. This specifies the new string with which the older strings get replaced.
.EXAMPLE
This example finds all the files within the F:\temp\test folder, replacing strings that matches the pattern "approject" + two digits, and replace them with "project25".
Replace-String -folderPath 'F:\temp\test' -oldString "project\d{2}" -newString 'project25'
.EXAMPLE
This example finds the file "F:\temp\test\profiles.csv" replacing strings that matches the pattern "approject" + two digits, and replace them with "project25".
Replace-String -file 'F:\temp\test\profiles.csv' -oldString "project\d{2}" -newString 'project25'
#>

[CmdletBinding()]
param (
[Parameter(Mandatory=$False)] [string]$folderPath,
[Parameter(Mandatory=$False)] [string[]]$file,
[Parameter(Mandatory=$True)] [string]$oldString,
[Parameter(Mandatory=$True)] [string]$newString
)
#If the user specifies a folder path, find all the files in that folder (not including subfolders), and replace the matching string of text with the new string.
#The reason why the formats are specified is that this Set-Content cmdlet can mess up with files that are not text based, such as Office Documents and pictures.
#Only use it with files types that can be edited through Notepad.
if ($folderPath -ne '') { Get-ChildItem -Path ($folderPath+"\*") -File -Include *.xml,*txt,*.ps1,*.csv | ForEach-Object {
(Get-Content $PSItem.FullName) -Replace$oldString,$newString | Set-Content -Path$PSItem.FullName;
}
}
#If a file or multiple files are specified in stead of a foler, only find and replace string within the specified folers.
elseif ($file -ne '') {$file | ForEach-Object {
(Get-Content $PSItem) -Replace$oldString,$newString | Set-Content -Path$PSItem;
}
}
elseif (($file -eq '') -AND ($folderPath -eq '') ) {
Write-Host "Warning: You need to specify what file(s) to process! Specify a file or file path and try again" -ForegroundColor Red;
}
}