Affine Transformation — why 3D matrix for a 2D transformation

Assumption: you know the basics of Linear Transformation by matrix multiplication. If not, this 3Blue1Brown’s video is a great intro.

Have you wondered why Affine Transformation typically uses a 3×3 matrix to transform a 2D image? It looks confusing at first, but it is actually a brilliant idea!

If you search around for articles on the topic, you can see that 3×3 matrices are used to perform transformations (scaling, translation, rotating, shearing) on images, which are 2D!

Credit: Cmglee at https://en.wikipedia.org/wiki/Affine_transformation

We know that the locations of each pixel in a 2D image can be represented with a 2D vector [x,y], and the image can be linearly transformed with a 2×2 matrix. So questions naturally come. Where do the 3×3 matrices come from? 3×3 matrices are not even compatible with 2D vectors! Why are we using a 3×3 matrix while it seems that a 2×2 can do the same?

In this article, I will answer the questions below:

  • Why Affine Transformation uses a 3×3 matrix to transform a 2D image? and
  • more confusing yet, why in OpenCV’s Affine Transformation function cv2.warpAffine(), the shape of the input transformation matrix is 2 x 3?

In linear transformation, a 2×2 matrix is used to do scaling, shearing, and rotating on a 2D vector [x,y], which is exactly what Affine Transformation does also. You see what’s missing? translation! Multiplying a 2D vector with a 2×2 matrix cannot achieve translation.

In linear transformation: scalingshearing and rotating, the basis vectors all stay on the same origin (0,0) before and after the transformation. That means the point (0,0) never changes location. To translate the image to a different location, you need to add a vector after the matrix transformation. Therefore, the general expression for Affine Transformation is q= Ap + b, which is

[p₁, p₂] can be understood as the original location of one pixel of an image. [q₁, q₂] is the new location after the transformation.

When vector b [b₁, b₂] is [0,0], there is no translation. In this case, a 2 x 2 matrix A is indeed sufficient. When b is not [0,0], the image moves to a different location (aka translation). b₁ and b₂ here determines the new location. Specifically, b₁ determines how much the location moves along the x axis and b₂ determines how much it moves along the y axis. This looks a bit cumbersome, doesn’t it? It is a 2-step calculation: matrix multiplication + vector addition.

KEY QUESTION: what if we can perform the entire transformation with just one matrix multiplication?

That’s exactly what a 3×3 matrix can do, combining the multiplication with a 2×2 matrix and adding of a 2D vector into one multiplication with a 3×3 matrix!

Here is how it works. The original points on the 2D plane are padded with 1 in the third axis, becoming (p₁, p₂, 1). This makes them points on a 3D plane with value of p₃ always 1. So the 2D image is still a 2D image, it’s just that now it is augmented into a 3D space. To visualize it:

When we shear the cube along the z axis, the image does not change shape or size but it moves to a different location from the perspectives of x and y axes! That’s exactly how the 3×3 matrix transformation helps!

What happens to the transformation matrix then? It is expanded into this

The key points here:

  • The last row of the transformation matrix A [0,0,1] makes sure that the dots after transformation are still on the same z=1 plane.
  • a₁₃ and a₂₃ determine how much the image is shifted along the x and y axes accordingly. They are actually the same as b₁ and b₂ in the 2D vector above.

It is worth noting the two 0s in position a₃₁ and a₃₂. They stay as 0 in Affine transformation. If either of them is none zero, the image will go out of the z=1 plane and when projected back to to the z=0 plane, it is no longer in the same shape. In that case, it is not an Affine Transformation anymore. That is why in OpenCV, the input transformation matrix for the Affine Transformation function cv2.warpAffine() is a 2 x 3 matrix, which only has 2 rows, as the third row [0, 0, 1] never changes!

OK, a recap:

  1. Why Affine Transformation typically use a 3×3 matrix to transform a 2D image?
    For saving computation steps and elegance (in my opinion), it combines a two-step calculation into one matrix multiplication.
  2. Why in OpenCV’s Affine Transformation function cv2.warpAffine(), the input transformation matrix for is 2 x 3 (hopefully not confusing anymore)?
    The third row of the transformation matrix always stays as [0,0,1] so no need to specify it in the function input.

Let me know if the explanation makes sense.

Bonus: ChatGPT’s answer:

Advertisement

Vendors’ Unique Position in Change Management

In any industry, projects are often done with the involvement of a vendor when you are short of resources or expertise for the implementation. Vendors, the good ones of course, contribute to your project with their knowledge, experience and resources. From the change management perspective, vendors are also in a unique position that brings you extra value. Two areas where they provide value while you may have challenges fulfilling within your organization:

  • Access to your leadership.
  • Subject matter expertise.

Executive Sponsorship is one of the three pillars of Prosci Project Change Triangle (PCT). Based on Prosci’s renowned industry study in the past 20 years, Active and visible sponsorship is identified as the top contributors to change management success. However, one of the challenges that face Change Managers in an organization is often adequate access to the right level of executive sponsor. This is often due to misunderstanding of change management in the organization and underestimation of it’s role and value in projects. It takes time and effort to change the situation if it happens in your organization. Among other efforts, you need to spend time with your sponsor to raise the awareness of the value of change management, provide coaching on their role as a sponsor and set up regular communication channels for feedback loops and continuous engagement. However, you only have a chance to make all such efforts if you have adequate access to your sponsors. If the change your are leading is part of a project that is delivered with a vendor, you have a handle to leverage to access your sponsor. Leadership in a organization often has a communication channel open to vendors. This is at least another route to get to your sponsor if you are struggling with your internal paths.

Another areas where vendors value add is their expertise. Every change management project is unique. Vendors, again the good ones, strive to succeed in the market through specialization and accumulated knowledge and best practices through years and even decades of industry experience. They are experts of the technical side of the project in the first place. The good ones are also experts in the people side of the change. Taking the example of Microsoft Office 365 adoption, an organization only does it internally once. However, for a Microsoft partner that is specialized in this domain, they do this repeatedly, across geographical locations and industries. When you work with them in training needs analysis in preparation for a tailored training plan for example, your vendor often gives you insights that enlighten you and save you effort. When the training plan is built, they also have ready resources with the right expertise to deliver the training.

Where vendors usually fall short though, is a deep understanding of the organization structure and the continuous engagement after the project. Therefore, when performing sponsor assessment for the change initiative, your vendor will need your help in mapping out the structure of sponsor coalition, identifying each person’s level of support for the change and their respective competency level. Another aspect is that project engagements with vendors are usually deliverables based. After the project is commissioned, the engagement is finished. However, it is often the prime time for the reinforcement stage of ADKAR®. You often need to rely on your internal resources for this stage while of course you can involve your vendor in devising the strategy and plans for reinforcing the change before they are let go.

Down to the Bottom – Weights Update When Minimizing the Error of the Cost Function for Linear Regression

The cost function for Linear Regression is Mean Squared Errors. It goes like below:

J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\left ( \hat{y}^{(i)}-y^{(i)} \right )^{2} = \frac{1}{m}\sum_{i=1}^{m}\left( h_{\theta}\left ( x^{(i)}\right ) -y^{(i)} \right )^{2}

x^{i} is the data point i in the training dataset. h_{\theta}\left ( x^{(i)}\right ) is a linear function for the weights and the data input, which is

h_{\theta}(x)=\theta^{T}x = \theta_{0}x_{0}+ \theta_{1}x_{1}+\theta_{2}x_{2}+\theta_{3}x_{3}+\cdot\cdot\cdot+\theta_{j}x_{j}

To find the best weights that minimize the error, we use Gradient Descent to update the weights. If you have been following Machine Learning courses, e.g. Machine Learning Course on Coursera by Andrew Ng, you should have learned that to update the weights, you need to repeat the process below until it converges:

\theta_{j} = \theta_{j} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_j^{(i)} for j=0…n (n features)

In Andrew Ng’s course, it it also expanded to:
\theta_{0} = \theta_{0} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_0^{(i)}
\theta_{1} = \theta_{1} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_1^{(i)}
\theta_{2} = \theta_{2} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_2^{(i)}

\theta_{j} = \theta_{j} - \alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-\hat{y})\cdot x_j^{(i)}

However, when I first studied the course a couple of years ago, I did get stuck for a little trying to figure out where that exactly came from. I wish someone had given me some more concrete expansion so I could figure it out faster. Let me do that here so you can examine the detailed breakdown and get through this stage quickly. That’s the whole objective of this blog.

Let’s make this less abstract by putting down exact data point with a small sample and feature sizes. Let’s say, we have a set of data with 4 features like below and we only select 3 samples from it for simplicity.
\begin{bmatrix} x_0^{(1)} & x_1^{(1)}  & x_2^{(1)}  & x_3^{(1)}  & x_4^{(1)}  \\ x_0^{(2)} & x_1^{(2)}  & x_2^{(2)}  & x_3^{(2)}  & x_4^{(2)} \\ x_0^{(3)} & x_1^{(3)}  & x_2^{(3)}  & x_3^{(3)}  & x_4^{(3)} \\ x_0^{(4)} & x_1^{(4)}  & x_2^{(4)}  & x_3^{(4)}  & x_4^{(4)} \\ x_0^{(5)} & x_1^{(5)}  & x_2^{(5)}  & x_3^{(5)}  & x_4^{(5)}  \end{bmatrix} \cdot\begin{bmatrix}\theta_0 \\ \theta_1 \\ \theta_2 \\ \theta_3\\ \theta_4  \end{bmatrix}

The prediction for each sample is:

\hat{y}^{(1)} = \theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}
\hat{y}^{(2)} = \theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)}
\hat{y}^{(3)} = \theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)}

In this case the cost function wold be:

J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\left ( \hat{y}^{(i)}-y^{(i)} \right )^{2} = \frac{1}{m}\sum_{i=1}^{m}\left( h_{\theta}\left ( x^{(i)}\right ) -y^{(i)} \right )^{2}
=((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)})-y^{1})^2 +((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^{2}))^2 +((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^{3}))^2

To minimize the cost, we need to find the partial derivatives of each \theta . The method to update the weights is

\theta_j = \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta)

Let’s find the derivative of the \theta s here, step by step. From the cost function above, by applying Chain Rule, we have:
\frac{\partial}{\partial\theta_0} J(\theta) =
2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{0}^{(1)}
+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{0}^{(2)}
+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{0}^{(3)}
= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_0^{(i)}

Do the same for the rest of the \theta s

\theta_1:
\frac{\partial}{\partial\theta_1} J(\theta) =
2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{1}^{(1)}
+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{1}^{(2)}
+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{1}^{(3)}
= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_1^{(i)}

\theta_2:
\frac{\partial}{\partial\theta_2} J(\theta) =
2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{2}^{(1)}
+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{2}^{(2)}
+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{2}^{(3)}
= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_2^{(i)}

\theta_3:
\frac{\partial}{\partial\theta_3} J(\theta) =
2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{3}^{(1)}
+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{3}^{(2)}
+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{3}^{(3)}
= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_3^{(i)}

\theta_4:
\frac{\partial}{\partial\theta_4} J(\theta) =
2((\theta_{0}x_{0}^{(1)}+ \theta_{1}x_{1}^{(1)}+\theta_{2}x_{2}^{(1)}+\theta_{3}x_{3}^{(1)}+\theta_{4}x_{4}^{(1)}) - y^1 )\cdot x_{4}^{(1)}
+ 2((\theta_{0}x_{0}^{(2)}+ \theta_{1}x_{1}^{(2)}+\theta_{2}x_{2}^{(2)}+\theta_{3}x_{3}^{(2)}+\theta_{4}x_{4}^{(2)})-y^2 )\cdot x_{4}^{(2)}
+ 2((\theta_{0}x_{0}^{(3)}+ \theta_{1}x_{1}^{(3)}+\theta_{2}x_{2}^{(3)}+\theta_{3}x_{3}^{(3)}+\theta_{4}x_{4}^{(3)})-y^3 )\cdot x_{4}^{(3)}
= \sum_{i=1}^3 2(\hat{y}^{(i)}-y^i)\cdot x_4^{(i)}

If you are at first unclear about the functions and notations in the courses or some other documentations, I hope the expansion above would help you figure it out.

A User Story in an Architect’s Eyes

You, at home, browsing Facebook with your smart phone, receive a push notification from WhatsApp. It is a message from your boss asking you to check out an article on your Intranet. You remember the link to the article was shared on Microsoft Teams previously, so you open Teams on your phone, find the link in that Channel under that Team. You click on it. Your browser (Edge) opens. After keying in your username and password, you receive a push notification from your Authenticator App, tap on Approve. Now you see the article on your phone.

Not too tedious of a process from the user’s perspective, is it? Did I even mention anything about VPN? That belongs to the past generation. If you implement SSO, it can even be faster as the login steps are not needed anymore.

The entire process is secured. Try copying the content of the article out. It cannot be done! Try opening another corp app that you could access through VPN previously. It is not accessible!

In the eyes of an architect, the process above is like below:

Intranet-from-outside-no-VPN

What are involved:

  • ADFS
  • Azure AD Conditional Access
  • Azure AD App Proxy
  • Microsoft Intune

Succeed with Speed in Data Migration Projects

If you work in the IT department of an organization or employed by an IT service provider, likely you have been involved in a data migration project, which most of the time has a very tight timeline. The question here is how can you meet the timeline? For example, how can you migrate 10 TB of data (structured and unstructured) from System A to System B within 3 months?

It is often said that to succeed in an IT project, you need three things: People, Process, Tool. In this article, I would like to add some insights into some of the underestimated factors that are essential for a successful migration.

When it comes to migration speed, too many people are asking what software to use to have a fast migration speed as if there is a magic tool that can make it happen in the blink of an eye. The reality is the tool does not matter as much. The more crucial factors are the infrastructure and the migration team.

  • Infrastructure. Network latency and bandwidth, Disk I/Os, CPU and RAM etc.. In many of the migration projects I’ve been involved in or learned about, the importance of infrastructure is as often underestimated as it turns out to be the shortest bar of the wooden barrel when all other factors are good. And infrastructure is usually not easy to change compared to other factors. If your infrastructure is not strong enough to support the goal of the project, try to see how you can improve it as much as you can. Otherwise, even if you have a team with superb driving skills, just because the truck you give them is too small to carry the load, you cannot expect the project to complete too soon.
  • People. This is intangible but is also often under-appreciated. (Experienced consultants know what I am talking about). An experienced team would know how best the project can be finished with what’s available at hand. Things are always not perfect. Mastery is about making the most of what you have. Following a user guide step-by-step does not cut it in a migration project with a timeline to commit.

To summarize, to succeed with speed, you need the right PEOPLE, PROCESS and TOOL with a good INFRASTRUCTURE.

 

Blockchain and Cryptocurrencies 102

If this is the first time you hear about Blockchain or cryptocurrencies (which is unlikely), you may want to look for a real 101 intro on YouTube or just Google it. However, if you have been hearing about it, but would like to know more, this article can give you some directions.

For technical folks, you would want to know how Blockchain actually works. No other demo can show you in a more effective way than the one built by Anders. Check the two links below which lead you to both demo videos as well as the demo environment you can play with.

There is also a must-read – the Whitepaper that declared the birth of Bitcoin: https://bitcoin.org/bitcoin.pdf

For non-tech folks who is considering investing in cryptocurrencies, I won’t be able to give you any advice on whether you should do it or what type of cryptocurrency to invest in and when. I am currently losing money myself, hahaha… But here is some more information for you to explore more:

Find the right wallet for yourself:

Other general info:

Design with the Mind in Mind

All designs are to serve the purpose of getting the info across efficiently to the audience. Therefore, the Design Principles are derived from how human perceive, process and memorize information.

Visual Perception

The first aspect to work with is how human visually perceive information presented to them. Human eyes and brains are very powerful in recognizing:

  • Features such as: color, value, angle, slope, length, texture and motion.
  • Patterns such as: proximity, closure and continuation, symmetry, similarity, common area etc.

With the characteristics above in mind, here are the design principles that we can derive:

  • User “pop out” to attract attention. This is to use contrast and emphasis using visual features to make the important elements standout. One of the ways the rotating banner attracts attention is through motion. Usually on a webpage, there is only banner or web part that rotates while other elements stay still.
  • Associate and organize items for skippability.

Nielson Norman Group conducted a study of how people read on a webpage by tracking their eye fixation. A common eye fixation pattern, F-pattern, is shown in the picture below. User start from the first row, reading most of the content of row and move down to the next row. However, as they move down, they tend to read less and less of each row. If the information does not appear in the first few lines, it is buried in the page with little visibility to the reader. This usually happens when no clear guidance is provided to the user and the user cannot easily discover the information they need. That is to say, this is a patter to be avoided. More on NNGroup’s latest article.

f_reading_pattern_eyetracking

An alternative pattern that indicates good reading experience is layer cake gaze pattern. Users work their way down the page, stopping to glance at each heading. They either read more of the section if it is relevant, or skip the section if they find it not relevant. This helps users locate the information they are after quickly, without them wasting time searching through the page line by line.

On a homepage of a portal, content is usually organized into sections ( or web parts), and the title of each section is clear and stands out. This is to help the user navigate their way through the page and find the information that’s of interest to them.

Memory

Human memorize information by both short-term and long-term memory.

Short-term memory

A research conducted by George Miller (1956) shows that 7±2 is the “magic number” for the limit on human capacity for processing information with short-term memory[1]. This means human can only process 5-9 items at one shot. Another cognitive scientist Cowan (2001) even provided evidence that that 4±1 is a more realistic figure[2]. This means when providing lists and options on the page, it is important to keep the list short. If the list must be long, for example, you might have a list of the 20 announcements from 5 different departments, all important to publish, you can choose to group them into sub groups and organize them under tabs.

Long-term Memory

People’s long-term memory can also be leveraged for a good UX design. This is to associate the new things (the portal) to the audience’s existing knowledge, which makes it easier for users to learn and adopt. This is the idea behind some of the design considerations on the intranet homepage. Users can easily discover the elements below even without the “pop out” effect.

  • Global Navigation menu at the top of the page.
  • Search Box at the top right corner.
  • A Button to access More Info at the top/bottom right corner of each web part.

These elements exist as part of the convention for a webpage. Therefore, there is no need to emphasize them. On the contrary, it is recommended to neutralize them to minimize the distraction, and to make other important elements stand out on the page.

[1] Miller 1956: http://www.musanim.com/miller1956/

[2] Cowan 2001: https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/magical-number-4-in-shortterm-memory-a-reconsideration-of-mental-storage-capacity/44023F1147D4A1D44BDC0AD226838496

 

SharePoint Infrastructure Workshop Topics

I’ve run Infrastructure Workshops for many SharePoint implementation projects over the years. The objective of the workshop is to figure our how the farm(s) will be implemented in the customer’s environment. A few important decisions are made based on the outcome of the workshop:

  • Server and storage sizing
  • Network placement
  • Service account management
  • DR strategy
  • Farm configuration
  • etc.

I am sharing the checklist here in case it helps. 🙂 Feel free to leave your questions or feedback in the comment.

Infra Workshop Topics

SharePoint Three-Tier Network Zoning Architecture

This topic came when there were policies in some organizations that require SharePoint to comply with the “three-tier architecture” requirement, meaning WFE, APP and DB should be in three network zones, and WFEs CANNOT connect to the DBs directly! It is common to see SharePoint farms servers placed in three zones, but most of the time WFEs are allowed to connect to the DB Server. However, this is not acceptable to the organizations that has the “three-tier” policy in place.

I saw in some projects a proxy server is implemented in the APP zone to bridge WFE and DB. It works. However, that proxy can easily become a bottleneck in larger implementations. So far the most common way of implementation is to place one or multiple reverse proxies in the Web zone as the web layer, shifting the SharePoint WFEs to the App zone together with the App servers.

{update April 2018} What I propose in this post is an alternative using technologies on SharePoint. However, based on practical experiences, this approach is more theoretical than practical. So far, there is no production implementation yet. {/update April 2018}

This alternative approach leverages the Request Management of SharePoint. With that you may say SharePoint actually supports the 3-tier requirement. How Request Manager works is elaborated on TechNet. The design here is to:

  • Put the WFEs in the APP Zone so they can connect to the databases directly.
  • Put a dedicated request management farm in the Web Zone in between the Load Balancer and the WFEs, with databases in the same zone.

Good 3-thier Architecture

This way, the Request Management layer serves as the web layer interacting with user requests. The WFEs in the APP Zone become the content serving components that retrieve data from the databases at the back end, while they are in the secure internal APP network zone.

Q: is it OK to place a database server of the Request Management farm in the web zone?

A: Yes, there is no user data stored in the database, only farm service configuration data. And the firewall rules should be restricting network traffic to the DB server to come only from the Request Management Farm servers.

Q: When the actual WFEs in the APP zone servers content back to the users, do they transfer it directly to the users?

A: No, it responds back to the users through the Request Management farm. This is actually the key part that makes the architecture above possible as users are not able to reach the App Zone.

Q: Should Office Web Apps Server be placed in the Web Zone or the same zone as the actual WFEs?

A: It must be reachable by the end users directly, so it should be more suitable to be in the Web Zone.

PS: The diagram below shows a bad design example that uses a proxy:

Bad 3-tier Architecture